Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

finemap_loci works for some loci, but not others? #139

Open
dym22 opened this issue Apr 28, 2024 · 2 comments
Open

finemap_loci works for some loci, but not others? #139

dym22 opened this issue Apr 28, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@dym22
Copy link

dym22 commented Apr 28, 2024

1. Bug description

When running finemap_loci on my 11 topSNPs, it proceeds successfully with 4 of them but fails for the other 7. The error (when it occurs) appears to happen during Step 2: Extract Linkage Disequilibrium. Specifically, for 7 of the loci, the console outputs "invalid 'path' argument" right after attempting to query the VCF tabix file.

2. Reproducible example

Note: the error also occurs when not using the force_new_* arguments. I am just including them so that it will reproduce the error as if running from scratch.

results <- echolocatoR::finemap_loci(
  fullSS_path = "/ix/ccdg/storage3/dym22/echo/sumstats_formatted",
  topSNPs = "/ix/ccdg/storage3/dym22/echo/topSNPs",
  LD_reference = "1KGphase3" ,
  superpopulation = "EAS",
  dataset_name = "OFC2",
  fullSS_genome_build = "GRCh38",
  bp_distance = 5e5,
  finemap_methods = c("ABF","SUSIE","FINEMAP"),
  munged = TRUE,
  results_dir = "/ix/ccdg/storage3/dym22/echo/echo_results",
  force_new_LD = T,
  force_new_subset = T,
  force_new_finemap = T
)

Console output

When running correctly:

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Step 2 ▶▶▶ Extract Linkage Disequilibrium 🔗 ───────────────────────────────────────────────────────────────────

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
LD_reference identified as: 1kg.
Using 1000Genomes as LD reference panel.
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
LD Reference Panel = 1KGphase3
Querying 1KG remote server.
Selecting 504 EAS individuals from 1kgphase3.
Performing liftover:  hg38  ==>  hg19
Using existing chain file.
========= echotabix::query =========
query_dat is already a GRanges object. Returning directly.
Explicit format: 'vcf'
Performing liftover:  hg38  ==>  hg19
Using existing chain file.
Querying VCF tabix file.
Querying VCF file using: VariantAnnotation
Checking query chromosome style is correct.
Chromosome format: 1
Filtering query to 504 samples and returning ScanVcfParam object.
Retrieving data.
Time difference of 8.6 secs
Removing 610 / 29,392 non-overlapping SNPs.
Saving VCF subset ==> /scratch/slurm-3150011/Rtmppqbvy3/VCF/Rtmppqbvy3.chr13-34361699-35361036.ALL.chr13.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.bgz
Time difference of 0.4 secs
Retrieved data with 610 rows across 504 samples.
echoLD::snpStats:: `MAF` column already present.
echoLD:snpStats:: Computing pairwise LD between 610 SNPs across 504 individuals (stats = R).
Time difference of 0.7 secs
610 x 610 LD_matrix (sparse)
Converting obj to sparseMatrix.
Saving sparse LD matrix ==> /ix/ccdg/storage3/dym22/echo/echo_results/GWAS/OFC2/chr13_33713257_A_T/LD/chr13_33713257_A_T.1KGphase3_LD.RDS
+ FILTER:: Filtering by LD features.

When running incorrectly:

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────

── Step 2 ▶▶▶ Extract Linkage Disequilibrium 🔗 ───────────────────────────────────────────────────────────────────

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
LD_reference identified as: 1kg.
Using 1000Genomes as LD reference panel.
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
LD Reference Panel = 1KGphase3
Querying 1KG remote server.
Selecting 504 EAS individuals from 1kgphase3.
Performing liftover:  hg38  ==>  hg19
Using existing chain file.
========= echotabix::query =========
query_dat is already a GRanges object. Returning directly.
Explicit format: 'vcf'
Performing liftover:  hg38  ==>  hg19
Using existing chain file.
Querying VCF tabix file.
invalid 'path' argumentLocus chr7_111475695_A_C complete in: 0.02 min

Data

[dym22@login0b echo]$ zcat sumstats_formatted.bgz | head
SNP	CHR	BP	A1	A2	ID	N	FRQ	T	SE_T	P	BETA	SE	P_INPUT	CONVERGE
rs376804038	1	73209	C	T	chr1:73209:C:T	1399	0.9874911	-1.11188	2.60166	0.669106	0.164271	0.384371	0.669106	1
rs377391031	1	98667	T	C	chr1:98667:T:C	1399	0.9899929	-2.41118	2.41094	0.317263	0.414817	0.414776	0.317263	1
rs1373207528	1	148488	A	G	chr1:148488:A:G	1399	0.00393100000000002	1.68691	1.57535	0.284251	-0.679735	0.63478	0.284251	1
rs1490700034	1	514484	C	T	chr1:514484:C:T	1399	0.9896355	-2.59702	2.4615	0.291399	0.428624	0.406257	0.291399	1
rs201764041	1	595259	G	A	chr1:595259:G:A	1399	0.9446033	-2.61526	5.48041	0.633218	0.0870741	0.182468	0.633218	1
rs61769339	1	727242	G	A	chr1:727242:G:A	1399	0.9124375	6.8175	6.82541	0.317872	-0.146341	0.146511	0.317872	1
rs138476838	1	732994	G	A	chr1:732994:G:A	1399	0.738385	6.5593	10.0037	0.512027	-0.0655441	0.0999627	0.512027	1
rs12238997	1	758351	A	G	chr1:758351:A:G	1399	0.9124375	6.3551	6.83119	0.352213	-0.136185	0.146387	0.352213	1
rs61769351	1	758443	G	C	chr1:758443:G:C	1399	0.915654	8.83591	6.7454	0.190224	-0.194194	0.148249	0.190224	1
[dym22@login0b echo]$ cat topSNPs
SNP,CHR,POS,A1,A2,Locus,N,Freq,T,SE_T,P,Effect,StdErr,P_INPUT,CONVERGE,Gene
rs529674375,12,101376840,G,A,chr12_101376840_G_A,1399,0.9656898,-18.8586,4.1128,4.53268e-06,1.11489,0.24349,4.67656e-06,1,chr12_101376840_G_A
rs9668896,12,93677236,C,T,chr12_93677236_C_T,1399,0.9417441,-26.9601,5.52197,1.04842e-06,0.884163,0.180987,1.03305e-06,1,chr12_93677236_C_T
rs74399411,13,106424444,T,C,chr13_106424444_T_C,1399,0.9889207,-11.5852,2.31447,5.57018e-07,2.16272,0.451328,1.65199e-06,1,chr13_106424444_T_C
rs56360313,13,33713257,A,T,chr13_33713257_A_T,1399,0.68549,51.7592,11.1113,3.18911e-06,-0.419235,0.0903915,3.51809e-06,1,chr13_33713257_A_T
rs4329516,1,209849556,C,T,chr1_209849556_C_T,1399,0.74589,-56.3087,10.6266,1.16535e-07,0.49864,0.093617,1.00184e-07,1,chr1_209849556_C_T
rs115562318,1,233144048,G,A,chr1_233144048_G_A,1399,0.9292352,-28.9316,6.12655,2.3315e-06,0.770799,0.164069,2.62692e-06,1,chr1_233144048_G_A
rs55901108,1,62226519,G,T,chr1_62226519_G_T,1399,0.9792709,-16.5888,3.45108,1.53327e-06,1.39286,0.290675,1.65297e-06,1,chr1_62226519_G_T
rs17461953,1,94085894,A,C,chr1_94085894_A_C,1399,0.832023,46.0283,8.81692,1.78488e-07,-0.592094,0.113384,1.76992e-07,1,chr1_94085894_A_C
rs6812051,4,76569817,A,G,chr4_76569817_A_G,1399,0.467119,-61.9498,11.8453,1.69601e-07,0.441516,0.0843009,1.62866e-07,1,chr4_76569817_A_G
rs9446804,6,72986179,G,A,chr6_72986179_G_A,1399,0.753038,-55.8363,10.1128,3.36418e-08,0.545975,0.0992578,3.78562e-08,1,chr6_72986179_G_A
rs143676388,7,111475695,A,C,chr7_111475695_A_C,1399,0.9446033,-26.4509,5.36831,8.34029e-07,0.917837,0.186902,9.07043e-07,1,chr7_111475695_A_C

3. Session info

> utils::sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux

Matrix products: default
BLAS:   /usr/lib64/libblas.so.3.4.2 
LAPACK: /usr/lib64/liblapack.so.3.4.2

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] reprex_2.0.2        snpStats_1.50.0     Matrix_1.6-4        survival_3.5-5      topr_2.0.0         
 [6] MungeSumstats_1.8.0 lubridate_1.9.2     forcats_1.0.0       stringr_1.5.1       dplyr_1.1.4        
[11] purrr_1.0.2         readr_2.1.5         tidyr_1.3.1         tibble_3.2.1        ggplot2_3.5.0      
[16] tidyverse_2.0.0     data.table_1.15.4   echolocatoR_2.0.3  

loaded via a namespace (and not attached):
  [1] fs_1.6.3                                    ProtGenerics_1.34.0                        
  [3] matrixStats_1.3.0                           bitops_1.0-7                               
  [5] EnsDb.Hsapiens.v75_2.99.0                   httr_1.4.7                                 
  [7] RColorBrewer_1.1-3                          Rgraphviz_2.44.0                           
  [9] tools_4.3.0                                 backports_1.4.1                            
 [11] utf8_1.2.4                                  R6_2.5.1                                   
 [13] DT_0.27                                     lazyeval_0.2.2                             
 [15] withr_3.0.0                                 prettyunits_1.2.0                          
 [17] GGally_2.2.1                                gridExtra_2.3                              
 [19] textshaping_0.3.6                           cli_3.6.2                                  
 [21] Biobase_2.62.0                              labeling_0.4.3                             
 [23] ggbio_1.50.0                                mvtnorm_1.2-4                              
 [25] proxy_0.4-27                                mixsqp_0.3-54                              
 [27] systemfonts_1.0.4                           Rsamtools_2.16.0                           
 [29] foreign_0.8-84                              R.utils_2.12.3                             
 [31] dichromat_2.0-0.1                           styler_1.9.1                               
 [33] BSgenome_1.70.2                             maps_3.4.2                                 
 [35] readxl_1.4.3                                susieR_0.12.35                             
 [37] rstudioapi_0.16.0                           RSQLite_2.3.6                              
 [39] httpcode_0.3.0                              pals_1.8                                   
 [41] generics_0.1.3                              BiocIO_1.12.0                              
 [43] echoconda_0.99.9                            zip_2.3.0                                  
 [45] fansi_1.0.6                                 DescTools_0.99.54                          
 [47] S4Vectors_0.40.2                            catalogueR_1.0.1                           
 [49] abind_1.4-5                                 R.methodsS3_1.8.2                          
 [51] lifecycle_1.0.4                             yaml_2.3.8                                 
 [53] SummarizedExperiment_1.32.0                 SparseArray_1.2.4                          
 [55] BiocFileCache_2.10.1                        echoplot_0.99.7                            
 [57] grid_4.3.0                                  blob_1.2.4                                 
 [59] crayon_1.5.2                                dir.expiry_1.8.0                           
 [61] lattice_0.21-8                              GenomicFeatures_1.54.4                     
 [63] KEGGREST_1.42.0                             mapproj_1.2.11                             
 [65] pillar_1.9.0                                knitr_1.46                                 
 [67] GenomicRanges_1.54.1                        rjson_0.2.21                               
 [69] osfr_0.2.9                                  boot_1.3-28.1                              
 [71] gld_2.6.6                                   SNPlocs.Hsapiens.dbSNP155.GRCh37_0.99.24   
 [73] SNPlocs.Hsapiens.dbSNP155.GRCh38_0.99.24    codetools_0.2-19                           
 [75] glue_1.7.0                                  remotes_2.4.2.1                            
 [77] coloc_5.2.3                                 vctrs_0.6.5                                
 [79] png_0.1-8                                   XGR_1.1.9                                  
 [81] cellranger_1.1.0                            gtable_0.3.5                               
 [83] assertthat_0.2.1                            cachem_1.0.8                               
 [85] dnet_1.1.7                                  xfun_0.43                                  
 [87] openxlsx_4.2.5.2                            S4Arrays_1.2.1                             
 [89] gargle_1.4.0                                nlme_3.1-162                               
 [91] usethis_2.1.6                               bit64_4.0.5                                
 [93] progress_1.2.3                              filelock_1.0.3                             
 [95] googleAuthR_2.0.1                           GenomeInfoDb_1.38.8                        
 [97] R.cache_0.16.0                              irlba_2.3.5.1                              
 [99] rpart_4.1.19                                colorspace_2.1-0                           
[101] BiocGenerics_0.48.1                         DBI_1.2.2                                  
[103] Hmisc_5.1-2                                 nnet_7.3-18                                
[105] processx_3.8.1                              Exact_3.2                                  
[107] tidyselect_1.2.1                            bit_4.0.5                                  
[109] compiler_4.3.0                              curl_5.0.0                                 
[111] graph_1.78.0                                htmlTable_2.4.2                            
[113] expm_0.999-9                                basilisk.utils_1.12.0                      
[115] xml2_1.3.4                                  DelayedArray_0.28.0                        
[117] rtracklayer_1.62.0                          checkmate_2.3.1                            
[119] scales_1.3.0                                hexbin_1.28.3                              
[121] callr_3.7.3                                 RBGL_1.78.0                                
[123] echoLD_0.99.9                               RCircos_1.2.2                              
[125] rappdirs_0.3.3                              supraHex_1.40.0                            
[127] digest_0.6.35                               piggyback_0.1.5                            
[129] rmarkdown_2.26                              basilisk_1.12.0                            
[131] XVector_0.42.0                              BSgenome.Hsapiens.1000genomes.hs37d5_0.99.1
[133] htmltools_0.5.8.1                           pkgconfig_2.0.3                            
[135] base64enc_0.1-3                             MatrixGenerics_1.14.0                      
[137] echodata_0.99.17                            dbplyr_2.5.0                               
[139] fastmap_1.1.1                               ensembldb_2.26.0                           
[141] rlang_1.1.3                                 htmlwidgets_1.6.4                          
[143] farver_2.1.1                                echofinemap_0.99.5                         
[145] jsonlite_1.8.8                              BiocParallel_1.36.0                        
[147] R.oo_1.26.0                                 VariantAnnotation_1.46.0                   
[149] RCurl_1.98-1.12                             magrittr_2.0.3                             
[151] Formula_1.2-5                               GenomeInfoDbData_1.2.11                    
[153] ggnetwork_0.5.13                            patchwork_1.2.0                            
[155] munsell_0.5.1                               Rcpp_1.0.12                                
[157] ape_5.7-1                                   ggnewscale_0.4.10                          
[159] viridis_0.6.5                               reticulate_1.28                            
[161] stringi_1.8.3                               rootSolve_1.8.2.4                          
[163] zlibbioc_1.48.2                             MASS_7.3-59                                
[165] plyr_1.8.9                                  ggstats_0.5.1                              
[167] parallel_4.3.0                              ggrepel_0.9.5                              
[169] lmom_3.0                                    echoannot_0.99.11                          
[171] Biostrings_2.70.3                           splines_4.3.0                              
[173] hms_1.1.3                                   ps_1.7.5                                   
[175] igraph_1.4.2                                BSgenome.Hsapiens.NCBI.GRCh38_1.3.1000     
[177] reshape2_1.4.4                              biomaRt_2.58.2                             
[179] stats4_4.3.0                                crul_1.4.0                                 
[181] XML_3.99-0.14                               evaluate_0.23                              
[183] biovizBase_1.50.0                           BiocManager_1.30.22                        
[185] tzdb_0.4.0                                  reshape_0.8.9                              
[187] echotabix_0.99.10                           restfulr_0.0.15                            
[189] AnnotationFilter_1.26.0                     e1071_1.7-14                               
[191] downloadR_0.99.6                            ragg_1.2.5                                 
[193] viridisLite_0.4.2                           class_7.3-21                               
[195] OrganismDbi_1.44.0                          memoise_2.0.1                              
[197] AnnotationDbi_1.64.1                        GenomicAlignments_1.38.2                   
[199] IRanges_2.36.0                              cluster_2.1.4                              
[201] timechange_0.2.0 
@dym22 dym22 added the bug Something isn't working label Apr 28, 2024
@bschilder bschilder self-assigned this May 10, 2024
@bschilder bschilder moved this from Todo to In Progress in 🦇🦇 echoverse 🦇🦇 May 10, 2024
@bschilder
Copy link
Member

It's possible this is related to your sumstats being in hg38, while 1KG is in hg19. The liftover step is imperfect and can result in losing certain SNPs:
RajLabMSSM/echoLD#11

Will look into whether there's a way to make the liftover more robust (if that is indeed the problem), or at very least provide the user with a more informative error.

@dym22
Copy link
Author

dym22 commented May 10, 2024

I had not considered that but pretty easy fix--just ran MungeSumstats::liftover to get them in 19, adjusted everything else accordingly, and it ran without incident.

I'm going to try to get the in-sample LD version running when I have a little more time, but very relieved to at least have something to work with. Thank you so much for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: In Progress
Development

No branches or pull requests

2 participants