Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow conditioning on a variant #45

Open
pjvandehaar opened this issue Feb 23, 2017 · 4 comments
Open

Allow conditioning on a variant #45

pjvandehaar opened this issue Feb 23, 2017 · 4 comments

Comments

@pjvandehaar
Copy link
Collaborator

pjvandehaar commented Feb 23, 2017

Goncalo says that all you need is r between that variant and each other for your data.

Option 1:

Set up an LD server that references the raw data, probably by copying Daniel's HVCF. Having raw data means that security gets complicated, and I think I don't want that.

Option 2: (Probably)

Pre-compute r for all variant pairs within 300kb from the raw data. ie, for each variant, store r for all variants for the next 300kb.

Ways to store it:

  • in a tabixed file containing tab-separated rs until they hit 300kb. Easy to make/store, decent to use, probably just 2 sigfigs, should be smaller than matrix.tsv.gz. So, look up the first variant, and iterate through rs at the same time as iterating through sites.tsv.gz until you hit the second variant.
  • in sqlite3. Two tables, one of [id, chr-pos-ref-alt], another of [variant1_id, variant2_id, r]?
@dtaliun
Copy link
Collaborator

dtaliun commented Feb 24, 2017 via email

@pjvandehaar pjvandehaar changed the title Allow conditioning on a variant Conditional Analysis Feb 28, 2017
@pjvandehaar pjvandehaar changed the title Conditional Analysis Allow conditioning on a variant Feb 28, 2017
@pjvandehaar
Copy link
Collaborator Author

pjvandehaar commented Mar 6, 2017

If some 300kb regions have 10x the average variant density, that could somewhat increase the size of pre-computed correlations. If some have 100x the average (ie, 1% of all variants in 0.01% of the genome), we'll have a problem. Oh well, hopefully that doesn't happen.

Maybe we should only allow conditioning on variants with pval < 1e-4. But if we want to support conditional meta-analysis, then we can't have a restriction like that.

How many variants will there probably be in TOPMed?

@abecasis
Copy link

abecasis commented Mar 6, 2017 via email

@pjvandehaar
Copy link
Collaborator Author

(While I'm doing this, remember to also use study-specific LD for showing LD in LZ. I'm not sure how we'll handle meta-analysis LD. Perhaps it'd be fun to toggle 1000G vs study-specific LD, &c?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants