You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running finemap_loci on my 11 topSNPs, it proceeds successfully with 4 of them but fails for the other 7. The error (when it occurs) appears to happen during Step 2: Extract Linkage Disequilibrium. Specifically, for 7 of the loci, the console outputs "invalid 'path' argument" right after attempting to query the VCF tabix file.
2. Reproducible example
Note: the error also occurs when not using the force_new_* arguments. I am just including them so that it will reproduce the error as if running from scratch.
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
── Step 2 ▶▶▶ Extract Linkage Disequilibrium 🔗 ───────────────────────────────────────────────────────────────────
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
LD_reference identified as: 1kg.
Using 1000Genomes as LD reference panel.
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
LD Reference Panel = 1KGphase3
Querying 1KG remote server.
Selecting 504 EAS individuals from 1kgphase3.
Performing liftover: hg38 ==> hg19
Using existing chain file.
========= echotabix::query =========
query_dat is already a GRanges object. Returning directly.
Explicit format: 'vcf'
Performing liftover: hg38 ==> hg19
Using existing chain file.
Querying VCF tabix file.
Querying VCF file using: VariantAnnotation
Checking query chromosome style is correct.
Chromosome format: 1
Filtering query to 504 samples and returning ScanVcfParam object.
Retrieving data.
Time difference of 8.6 secs
Removing 610 / 29,392 non-overlapping SNPs.
Saving VCF subset ==> /scratch/slurm-3150011/Rtmppqbvy3/VCF/Rtmppqbvy3.chr13-34361699-35361036.ALL.chr13.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.bgz
Time difference of 0.4 secs
Retrieved data with 610 rows across 504 samples.
echoLD::snpStats:: `MAF` column already present.
echoLD:snpStats:: Computing pairwise LD between 610 SNPs across 504 individuals (stats = R).
Time difference of 0.7 secs
610 x 610 LD_matrix (sparse)
Converting obj to sparseMatrix.
Saving sparse LD matrix ==> /ix/ccdg/storage3/dym22/echo/echo_results/GWAS/OFC2/chr13_33713257_A_T/LD/chr13_33713257_A_T.1KGphase3_LD.RDS
+ FILTER:: Filtering by LD features.
When running incorrectly:
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
── Step 2 ▶▶▶ Extract Linkage Disequilibrium 🔗 ───────────────────────────────────────────────────────────────────
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
LD_reference identified as: 1kg.
Using 1000Genomes as LD reference panel.
Constructing GRanges query using min/max ranges across one or more chromosomes.
+ as_blocks=TRUE: Will query a single range per chromosome that covers all regions requested (plus anything in between).
LD Reference Panel = 1KGphase3
Querying 1KG remote server.
Selecting 504 EAS individuals from 1kgphase3.
Performing liftover: hg38 ==> hg19
Using existing chain file.
========= echotabix::query =========
query_dat is already a GRanges object. Returning directly.
Explicit format: 'vcf'
Performing liftover: hg38 ==> hg19
Using existing chain file.
Querying VCF tabix file.
invalid 'path' argumentLocus chr7_111475695_A_C complete in: 0.02 min
Data
[dym22@login0b echo]$ zcat sumstats_formatted.bgz | head
SNP CHR BP A1 A2 ID N FRQ T SE_T P BETA SE P_INPUT CONVERGE
rs376804038 1 73209 C T chr1:73209:C:T 1399 0.9874911 -1.11188 2.60166 0.669106 0.164271 0.384371 0.669106 1
rs377391031 1 98667 T C chr1:98667:T:C 1399 0.9899929 -2.41118 2.41094 0.317263 0.414817 0.414776 0.317263 1
rs1373207528 1 148488 A G chr1:148488:A:G 1399 0.00393100000000002 1.68691 1.57535 0.284251 -0.679735 0.63478 0.284251 1
rs1490700034 1 514484 C T chr1:514484:C:T 1399 0.9896355 -2.59702 2.4615 0.291399 0.428624 0.406257 0.291399 1
rs201764041 1 595259 G A chr1:595259:G:A 1399 0.9446033 -2.61526 5.48041 0.633218 0.0870741 0.182468 0.633218 1
rs61769339 1 727242 G A chr1:727242:G:A 1399 0.9124375 6.8175 6.82541 0.317872 -0.146341 0.146511 0.317872 1
rs138476838 1 732994 G A chr1:732994:G:A 1399 0.738385 6.5593 10.0037 0.512027 -0.0655441 0.0999627 0.512027 1
rs12238997 1 758351 A G chr1:758351:A:G 1399 0.9124375 6.3551 6.83119 0.352213 -0.136185 0.146387 0.352213 1
rs61769351 1 758443 G C chr1:758443:G:C 1399 0.915654 8.83591 6.7454 0.190224 -0.194194 0.148249 0.190224 1
[dym22@login0b echo]$ cat topSNPs
SNP,CHR,POS,A1,A2,Locus,N,Freq,T,SE_T,P,Effect,StdErr,P_INPUT,CONVERGE,Gene
rs529674375,12,101376840,G,A,chr12_101376840_G_A,1399,0.9656898,-18.8586,4.1128,4.53268e-06,1.11489,0.24349,4.67656e-06,1,chr12_101376840_G_A
rs9668896,12,93677236,C,T,chr12_93677236_C_T,1399,0.9417441,-26.9601,5.52197,1.04842e-06,0.884163,0.180987,1.03305e-06,1,chr12_93677236_C_T
rs74399411,13,106424444,T,C,chr13_106424444_T_C,1399,0.9889207,-11.5852,2.31447,5.57018e-07,2.16272,0.451328,1.65199e-06,1,chr13_106424444_T_C
rs56360313,13,33713257,A,T,chr13_33713257_A_T,1399,0.68549,51.7592,11.1113,3.18911e-06,-0.419235,0.0903915,3.51809e-06,1,chr13_33713257_A_T
rs4329516,1,209849556,C,T,chr1_209849556_C_T,1399,0.74589,-56.3087,10.6266,1.16535e-07,0.49864,0.093617,1.00184e-07,1,chr1_209849556_C_T
rs115562318,1,233144048,G,A,chr1_233144048_G_A,1399,0.9292352,-28.9316,6.12655,2.3315e-06,0.770799,0.164069,2.62692e-06,1,chr1_233144048_G_A
rs55901108,1,62226519,G,T,chr1_62226519_G_T,1399,0.9792709,-16.5888,3.45108,1.53327e-06,1.39286,0.290675,1.65297e-06,1,chr1_62226519_G_T
rs17461953,1,94085894,A,C,chr1_94085894_A_C,1399,0.832023,46.0283,8.81692,1.78488e-07,-0.592094,0.113384,1.76992e-07,1,chr1_94085894_A_C
rs6812051,4,76569817,A,G,chr4_76569817_A_G,1399,0.467119,-61.9498,11.8453,1.69601e-07,0.441516,0.0843009,1.62866e-07,1,chr4_76569817_A_G
rs9446804,6,72986179,G,A,chr6_72986179_G_A,1399,0.753038,-55.8363,10.1128,3.36418e-08,0.545975,0.0992578,3.78562e-08,1,chr6_72986179_G_A
rs143676388,7,111475695,A,C,chr7_111475695_A_C,1399,0.9446033,-26.4509,5.36831,8.34029e-07,0.917837,0.186902,9.07043e-07,1,chr7_111475695_A_C
It's possible this is related to your sumstats being in hg38, while 1KG is in hg19. The liftover step is imperfect and can result in losing certain SNPs: RajLabMSSM/echoLD#11
Will look into whether there's a way to make the liftover more robust (if that is indeed the problem), or at very least provide the user with a more informative error.
I had not considered that but pretty easy fix--just ran MungeSumstats::liftover to get them in 19, adjusted everything else accordingly, and it ran without incident.
I'm going to try to get the in-sample LD version running when I have a little more time, but very relieved to at least have something to work with. Thank you so much for your help!
1. Bug description
When running finemap_loci on my 11 topSNPs, it proceeds successfully with 4 of them but fails for the other 7. The error (when it occurs) appears to happen during Step 2: Extract Linkage Disequilibrium. Specifically, for 7 of the loci, the console outputs "invalid 'path' argument" right after attempting to query the VCF tabix file.
2. Reproducible example
Note: the error also occurs when not using the force_new_* arguments. I am just including them so that it will reproduce the error as if running from scratch.
Console output
When running correctly:
When running incorrectly:
Data
3. Session info
The text was updated successfully, but these errors were encountered: