-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow conditioning on a variant #45
Comments
Hi Peter,
I have also a pythonic code that uses raw tabix’ed VCFs + numpy linear algebra (which can be speeded up compiling against BLAST/LAPACK).
Daniel
… On Feb 23, 2017, at 5:26 PM, Peter VandeHaar ***@***.***> wrote:
Goncalo says that all you need is r between that variant and each other for your data. (separate for cases and controls?)
Option 1: (I GUESS SO)
Set up an LD server that references the raw data, probably by copying Daniel's HVCF.
Option 2: (NAH)
Pre-compute r for all variant pairs within 300kb from the raw data.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#45>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMPXBGoFrmcvehCMquL90xJ_wEbL0QtPks5rfgeAgaJpZM4MKlN6>.
|
If some 300kb regions have 10x the average variant density, that could somewhat increase the size of pre-computed correlations. If some have 100x the average (ie, 1% of all variants in 0.01% of the genome), we'll have a problem. Oh well, hopefully that doesn't happen. Maybe we should only allow conditioning on variants with pval < 1e-4. But if we want to support conditional meta-analysis, then we can't have a restriction like that. How many variants will there probably be in TOPMed? |
Most variants and regions will be 10^-4 for something.
Densest regions are around HLA genes and MHC on chromosome 6.
If we set this up right, we should only need one covariance table for many traits and variants in one PheWeb.
G
…Sent from my iPhone
On Mar 6, 2017, at 2:45 PM, Peter VandeHaar ***@***.***> wrote:
If some 300kb regions have 10x the average variant density, that could somewhat increase the size of pre-computed correlations. If some have 100x the average (ie, 1% of all variants in 0.01% of the genome), we'll have a problem. Oh well, hopefully that doesn't happen.
Maybe we should only allow conditioning on variants with pval < 1e-4. But if we allow conditional meta-analysis, then we can't have a restriction like that.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
(While I'm doing this, remember to also use study-specific LD for showing LD in LZ. I'm not sure how we'll handle meta-analysis LD. Perhaps it'd be fun to toggle 1000G vs study-specific LD, &c?) |
Goncalo says that all you need is
r
between that variant and each other for your data.Option 1:
Set up an LD server that references the raw data, probably by copying Daniel's HVCF. Having raw data means that security gets complicated, and I think I don't want that.
Option 2: (Probably)
Pre-compute
r
for all variant pairs within 300kb from the raw data. ie, for each variant, storer
for all variants for the next 300kb.Ways to store it:
r
s until they hit 300kb. Easy to make/store, decent to use, probably just 2 sigfigs, should be smaller thanmatrix.tsv.gz
. So, look up the first variant, and iterate throughr
s at the same time as iterating throughsites.tsv.gz
until you hit the second variant.id
,chr-pos-ref-alt
], another of [variant1_id
,variant2_id
,r
]?The text was updated successfully, but these errors were encountered: