Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

european_samples.tsv is ambiguous for mapping to application-specific ids #44

Open
ddrichel opened this issue Mar 24, 2023 · 0 comments

Comments

@ddrichel
Copy link

The file european_samples.tsv from
https://broad-ukb-sumstats-us-east-1.s3.amazonaws.com/round2/additive-tsvs/european_samples.tsv.bgz
contains plate and well ids, which is supposed to obtain application-specific sample ids from ukb_sqc_v2.txt.
However, the batch id is also required, as plate and well ids are not matching to sample ids unambiguously. For instance, the following entries appear twice in european_samples.tsv:
SMP4_0014640A H04
SMP4_0014502A E05

Overall, the following entries from european_samples.tsv appear twice in ukb_sqc_v2.txt in different batches:
SMP4_0013746A H09
SMP4_0014502A A08
SMP4_0014502A E05
SMP4_0014503A F01
SMP4_0014641A B04
SMP4_0014641A C05
SMP4_0016202A B01
SMP4_0016202A C01
SMP4_0012383A C09
SMP4_0014640A H04

Sex and self-reported British ancestry are not sufficient to resolve the ambiguities for all samples.
Can we still have european_samples.tsv with batch id (e.g. Batch_b043, Batch_b053, ...) added?

Thanks in advance

Dmitriy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant