Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with RunPCA when npcs > 25 #8908

Open
milani31 opened this issue May 17, 2024 · 3 comments
Open

Problems with RunPCA when npcs > 25 #8908

milani31 opened this issue May 17, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@milani31
Copy link

Dear Seurat Team,
thank you for developing this amazing tool!

I am completely new to R and RNA-seq analysis and especially scRNA analysis.
I am trying to reproduce the steps of the PBMC 3K guided tutorial and encountered some issues while running it.
My R version, RStudio, as well as all the packages are freshly installed with the latest versions (Seurat 5.1.0, SeuratObject 5.0.2).
The pbmc3k dataset was downloaded from the link in the tutorial.
The code is diplayed below.

pbmc.data <- Read10X(datapath)
pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200)
pbmc

pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")

pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)

pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", scale.factor = 10000)
pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)

top10 <- head(VariableFeatures(pbmc), 10)

plot1 <- VariableFeaturePlot(pbmc)
plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE)
plot1 + plot2

all.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, features = all.genes)

pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))

print(pbmc[["pca"]], dims = 1:5, nfeatures = 5)
DimPlot(pbmc, reduction = "pca") + NoLegend()

When examining the PCA results, they are very different from what is expected. What concerns me most is the Dimensional reduction plot showing high exponential values on the x and y axis (see image).

PC_ 1
Positive: GZMK, NCR3, VPS13C, TMSB4X, MT-CO2
Negative: SELL, IGFBP7, PSMC6, LIG1, ARRDC3
PC_ 2
Positive: GZMK, NCR3, VPS13C, TMSB4X, MT-CO2
Negative: SELL, IGFBP7, PSMC6, LIG1, ARRDC3
PC_ 3
Positive: SELL, IGFBP7, PSMC6, LIG1, GPKOW
Negative: GZMK, NCR3, VPS13C, TMSB4X, MT-CO2
PC_ 4
Positive: GZMK, NCR3, VPS13C, TMSB4X, KLRG1
Negative: SELL, IGFBP7, PSMC6, LIG1, GPKOW
PC_ 5
Positive: PSMC6, SELL, IGFBP7, ARRDC3, GNB2
Negative: GZMK, NCR3, VPS13C, MT-CO2, ITSN2

pbmc3k_DimPlot_RunPCA_pscs_50

From this point on it is not possible to continue further analysis steps. The error appearing after trying to run UMAP hints that infinite values somehow are inside the input matrix but I honestly have no clue how or when this could have happened.

pbmc <- RunUMAP(pbmc, dims = 1:10)

Error in x2set(Xsub, n_neighbors, metric, nn_method = nn_sub, n_trees, :
Non-finite entries in the input matrix

However, when setting npcs = 25 in the RunPCA function the results are comparable the the tutorial output, even the Dimensional reduction plot.

pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc), npcs=25)

print(pbmc[["pca"]], dims = 1:5, nfeatures = 5)

PC_ 1
Positive: CST3, TYROBP, LST1, AIF1, FTL
Negative: MALAT1, LTB, IL32, IL7R, CD2
PC_ 2
Positive: CD79A, MS4A1, TCL1A, HLA-DQA1, HLA-DQB1
Negative: NKG7, PRF1, CST7, GZMB, GZMA
PC_ 3
Positive: PPBP, PF4, SDPR, SPARC, GNG11
Negative: HLA-DQA1, CD79A, CD79B, HLA-DQB1, HLA-DPB1
PC_ 4
Positive: HLA-DQA1, CD79B, CD79A, MS4A1, HLA-DQB1
Negative: VIM, IL7R, S100A6, IL32, S100A8
PC_ 5
Positive: LTB, IL7R, CKB, VIM, MS4A7
Negative: GZMB, NKG7, S100A8, FGFBP2, GNLY

I already downgraded to Seurat version 4.4.0 and SeuratObject version 4.1.4 but I still get the same problem. I also used other datasets and used SCTransform and even reinstalled R. I tried to find a solution online where that issue was mentioned somehow but there was no explanation what could be causing this and how to fix this. I would be very grateful for any suggestions.

sessionInfo()
R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0

locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=de_DE.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=de_DE.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C

time zone: Europe/Berlin
tzcode source: system (glibc)

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] ggplot2_3.5.1 patchwork_1.2.0 Seurat_5.1.0 SeuratObject_5.0.2 sp_2.1-4

loaded via a namespace (and not attached):
[1] deldir_2.0-4 pbapply_1.7-2 gridExtra_2.3 rlang_1.1.3 magrittr_2.0.3 RcppAnnoy_0.0.22 matrixStats_1.3.0
[8] ggridges_0.5.6 compiler_4.4.0 spatstat.geom_3.2-9 png_0.1-8 vctrs_0.6.5 reshape2_1.4.4 stringr_1.5.1
[15] pkgconfig_2.0.3 fastmap_1.1.1 labeling_0.4.3 utf8_1.2.4 promises_1.3.0 purrr_1.0.2 jsonlite_1.8.8
[22] goftest_1.2-3 later_1.3.2 spatstat.utils_3.0-4 irlba_2.3.5.1 parallel_4.4.0 cluster_2.1.6 R6_2.5.1
[29] ica_1.0-3 stringi_1.8.4 RColorBrewer_1.1-3 spatstat.data_3.0-4 reticulate_1.36.1 parallelly_1.37.1 lmtest_0.9-40
[36] scattermore_1.2 Rcpp_1.0.12 tensor_1.5 future.apply_1.11.2 zoo_1.8-12 R.utils_2.12.3 sctransform_0.4.1
[43] httpuv_1.6.15 Matrix_1.6-5 splines_4.4.0 igraph_2.0.3 tidyselect_1.2.1 rstudioapi_0.16.0 abind_1.4-5
[50] spatstat.random_3.2-3 codetools_0.2-19 miniUI_0.1.1.1 spatstat.explore_3.2-7 listenv_0.9.1 lattice_0.22-5 tibble_3.2.1
[57] plyr_1.8.9 withr_3.0.0 shiny_1.8.1.1 ROCR_1.0-11 Rtsne_0.17 future_1.33.2 fastDummies_1.7.3
[64] survival_3.5-8 polyclip_1.10-6 fitdistrplus_1.1-11 pillar_1.9.0 KernSmooth_2.23-22 plotly_4.10.4 generics_0.1.3
[71] RcppHNSW_0.6.0 munsell_0.5.1 scales_1.3.0 globals_0.16.3 xtable_1.8-4 glue_1.7.0 lazyeval_0.2.2
[78] tools_4.4.0 data.table_1.15.4 RSpectra_0.16-1 RANN_2.6.1 leiden_0.4.3.1 dotCall64_1.1-1 cowplot_1.1.3
[85] grid_4.4.0 tidyr_1.3.1 colorspace_2.1-0 nlme_3.1-163 cli_3.6.2 spatstat.sparse_3.0-3 spam_2.10-0
[92] fansi_1.0.6 viridisLite_0.4.2 dplyr_1.1.4 uwot_0.2.2 gtable_0.3.5 R.methodsS3_1.8.2 digest_0.6.35
[99] progressr_0.14.0 ggrepel_0.9.5 farver_2.1.1 htmlwidgets_1.6.4 R.oo_1.26.0 htmltools_0.5.8.1 lifecycle_1.0.4
[106] httr_1.4.7 mime_0.12 MASS_7.3-60.0.1

@milani31 milani31 added the bug Something isn't working label May 17, 2024
@longmanz
Copy link
Contributor

Hi,
I have rerun your commands but could not replicate the error. The PCA map from my end looks the same as the one from the vignette.
Could you confirm that your raw data was downloaded from this link (https://cf.10xgenomics.com/samples/cell/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz)?

In addition, if you directly load the pbmc3k from the SeuratData package, do you also see the same error?

@milani31
Copy link
Author

Hi longmanz,
thanks for your reply. Yes, I get the same error in both cases. I also tried with 10X data from a publication I would like to reanalyze and the same problem always seems to occur.

@PonGanish
Copy link

PonGanish commented Jun 19, 2024

Hi,

I also face a similar issue when using the RunPCA function. It returns PCs calculated based on the same gene sets as shown below:

pbmc <- RunPCA(pbmc, verbose = TRUE)

PC_ 1
Positive:  CST3, FTL, BRK1, HNRNPA0, RPL23A, PPDPF, PCBP1, PTMA, FTH1, LYZ 
           RPLP1, MIF, LDHA, RPS24, SLC25A5, ANXA6, ABRACL, TMSB4X, S100A9, HCST 
           RNH1, PTPRC, GNG5, ACAP1, ISCU, HCLS1, CYBA, RPL10A, FXYD5, TIMP1 
Negative:  RPSAP58, CCNI, CNBP, GMFG, TSC22D3, LY6E, ANAPC16, YBX1, CD99, EEF1B2 
           ATP5B, DRAP1, SRGN, ALDOA, ISG15, GPSM3, PSME1, TOMM7, BTF3, ISG20 
           BTG1, RPSA, NBEAL1, HSPA8, SRSF5, GIMAP7, UBL5, PARK7, MT2A, SH3BGRL3

PC_ 2
Positive:  FTL, BRK1, CST3, HNRNPA0, PCBP1, RPL23A, PPDPF, PTMA, FTH1, LYZ 
           MIF, LDHA, RPLP1, SLC25A5, RPS24, ANXA6, TMSB4X, HCST, ABRACL, S100A9 
           RNH1, PTPRC, GNG5, CYBA, ACAP1, HCLS1, ISCU, TIMP1, RPL10A, FXYD5 
Negative:  RPSAP58, CCNI, CNBP, GMFG, TSC22D3, LY6E, ANAPC16, YBX1, CD99, ATP5B 
           EEF1B2, DRAP1, ALDOA, SRGN, ISG15, GPSM3, BTF3, TOMM7, PSME1, ISG20 
           RPSA, BTG1, HSPA8, SRSF5, NBEAL1, GIMAP7, PARK7, UBL5, MT2A, SH3BGRL3

PC_ 3
Positive:  CST3, BRK1, FTL, HNRNPA0, PPDPF, RPL23A, PCBP1, PTMA, FTH1, LYZ 
           RPLP1, MIF, LDHA, RPS24, ANXA6, SLC25A5, ABRACL, S100A9, TMSB4X, HCST 
           RNH1, GNG5, PTPRC, ACAP1, ISCU, HCLS1, CYBA, RPL10A, S100A10, FXYD5 
Negative:  RPSAP58, CCNI, CNBP, TSC22D3, GMFG, LY6E, ANAPC16, YBX1, EEF1B2, CD99 
           ATP5B, DRAP1, ISG15, SRGN, ALDOA, GPSM3, PSME1, TOMM7, BTG1, BTF3 
           ISG20, NBEAL1, RPSA, HSPA8, GIMAP7, SRSF5, UBL5, MT2A, PARK7, U2AF1

PC_ 4
Positive:  CST3, FTL, BRK1, HNRNPA0, RPL23A, PPDPF, PCBP1, PTMA, FTH1, LYZ 
           RPLP1, MIF, LDHA, RPS24, SLC25A5, ANXA6, ABRACL, TMSB4X, S100A9, HCST 
           RNH1, GNG5, PTPRC, ACAP1, ISCU, HCLS1, CYBA, RPL10A, FXYD5, TIMP1 
Negative:  RPSAP58, CCNI, CNBP, GMFG, TSC22D3, LY6E, ANAPC16, YBX1, CD99, EEF1B2 
           ATP5B, DRAP1, SRGN, ALDOA, ISG15, GPSM3, PSME1, TOMM7, BTF3, ISG20 
           BTG1, RPSA, NBEAL1, HSPA8, SRSF5, GIMAP7, UBL5, PARK7, MT2A, SH3BGRL3

PC_ 5
Positive:  FTL, BRK1, CST3, HNRNPA0, PCBP1, RPL23A, PPDPF, PTMA, FTH1, LYZ 
           MIF, RPLP1, LDHA, SLC25A5, RPS24, ANXA6, TMSB4X, ABRACL, HCST, S100A9 
           RNH1, PTPRC, GNG5, ACAP1, CYBA, HCLS1, ISCU, RPL10A, TIMP1, FXYD5 
Negative:  RPSAP58, CCNI, CNBP, GMFG, TSC22D3, LY6E, ANAPC16, YBX1, CD99, EEF1B2 
           ATP5B, DRAP1, ALDOA, SRGN, ISG15, GPSM3, TOMM7, PSME1, BTF3, ISG20 
           RPSA, BTG1, HSPA8, SRSF5, NBEAL1, GIMAP7, UBL5, PARK7, MT2A, SH3BGRL3

After this, when I run the UMAP function, I get the error:

Error in x2set(Xsub, n_neighbors, metric, nn_method = nn_sub, n_trees, :
Non-finite entries in the input matrix

tried reinstalling R and RStudio, and also tried with different datasets, but still encounter the same issue.

SESSION INFO:
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RColorBrewer_1.1-3 patchwork_1.2.0    dplyr_1.1.4        sctransform_0.4.1  ggplot2_3.5.1      Seurat_5.1.0       SeuratObject_5.0.2 sp_2.1-4          

loaded via a namespace (and not attached):
  [1] jsonlite_1.8.8              magrittr_2.0.3              spatstat.utils_3.0-5        farver_2.1.2                zlibbioc_1.50.0            
  [6] vctrs_0.6.5                 ROCR_1.0-11                 DelayedMatrixStats_1.26.0   memoise_2.0.1               spatstat.explore_3.2-7     
 [11] S4Arrays_1.4.1              htmltools_0.5.8.1           SparseArray_1.4.8           parallelly_1.37.1           KernSmooth_2.23-24         
 [16] htmlwidgets_1.6.4           ica_1.0-3                   plyr_1.8.9                  plotly_4.10.4               zoo_1.8-12                 
 [21] cachem_1.1.0                igraph_2.0.3                mime_0.12                   lifecycle_1.0.4             pkgconfig_2.0.3            
 [26] Matrix_1.6-5                R6_2.5.1                    fastmap_1.2.0               MatrixGenerics_1.16.0       GenomeInfoDbData_1.2.12    
 [31] fitdistrplus_1.1-11         future_1.33.2               shiny_1.8.1.1               digest_0.6.35               colorspace_2.1-0           
 [36] AnnotationDbi_1.66.0        S4Vectors_0.42.0            tensor_1.5                  RSpectra_0.16-1             irlba_2.3.5.1              
 [41] GenomicRanges_1.56.1        RSQLite_2.3.7               labeling_0.4.3              progressr_0.14.0            fansi_1.0.6                
 [46] spatstat.sparse_3.0-3       httr_1.4.7                  polyclip_1.10-6             abind_1.4-5                 compiler_4.4.1             
 [51] bit64_4.0.5                 withr_3.0.0                 DBI_1.2.3                   fastDummies_1.7.3           R.utils_2.12.3             
 [56] MASS_7.3-60                 DelayedArray_0.30.1         tools_4.4.1                 lmtest_0.9-40               httpuv_1.6.15              
 [61] future.apply_1.11.2         goftest_1.2-3               glmGamPoi_1.16.0            R.oo_1.26.0                 glue_1.7.0                 
 [66] nlme_3.1-165                promises_1.3.0              grid_4.4.1                  Rtsne_0.17                  cluster_2.1.6              
 [71] reshape2_1.4.4              generics_0.1.3              gtable_0.3.5                spatstat.data_3.0-4         R.methodsS3_1.8.2          
 [76] tidyr_1.3.1                 data.table_1.15.4           utf8_1.2.4                  XVector_0.44.0              BiocGenerics_0.50.0        
 [81] spatstat.geom_3.2-9         RcppAnnoy_0.0.22            ggrepel_0.9.5               RANN_2.6.1                  pillar_1.9.0               
 [86] stringr_1.5.1               spam_2.10-0                 RcppHNSW_0.6.0              later_1.3.2                 splines_4.4.1              
 [91] lattice_0.22-5              survival_3.7-0              bit_4.0.5                   deldir_2.0-4                tidyselect_1.2.1           
 [96] Biostrings_2.72.1           miniUI_0.1.1.1              pbapply_1.7-2               gridExtra_2.3               IRanges_2.38.0             
[101] SummarizedExperiment_1.34.0 scattermore_1.2             stats4_4.4.1                Biobase_2.64.0              matrixStats_1.3.0          
[106] stringi_1.8.4               UCSC.utils_1.0.0            lazyeval_0.2.2              codetools_0.2-19            tibble_3.2.1               
[111] cli_3.6.2                   uwot_0.2.2                  xtable_1.8-4                reticulate_1.37.0           munsell_0.5.1              
[116] Rcpp_1.0.12                 GenomeInfoDb_1.40.1         globals_0.16.3              spatstat.random_3.2-3       png_0.1-8                  
[121] parallel_4.4.1              blob_1.2.4                  dotCall64_1.1-1             sparseMatrixStats_1.16.0    listenv_0.9.1              
[126] viridisLite_0.4.2           scales_1.3.0                ggridges_0.5.6              leiden_0.4.3.1              purrr_1.0.2                
[131] crayon_1.5.2                rlang_1.1.4                 cowplot_1.1.3               KEGGREST_1.44.0

Could the authors please clarify this as soon as possible?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants