-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
different sample size between TCGA portal and TCGAbiolinks package #605
Comments
Some of the maf files are empty
Here is one example:
https://portal.gdc.cancer.gov/files/73cec020-9d79-4189-8ae7-b6be0c867371
[image: Screenshot 2023-10-12 at 12.35.40 PM.png]
Message ID: ***@***.***>
… |
@tiagochst I still don't understand, Are counts not suppose to be higher in the TCGA portal? Why it is higher in the TCGAbiolinks results? Or another way to ask this question, how I can reach the equal sample size in the TCGA portal? |
Hi,
Yes, file counts are the same for both; it is 482. Please, where do 462
samples come from?
TCGAbiolinks shows 407 patients while GDC shows 419 cases.
407 comes from: unique(substr(maf$Tumor_Sample_Barcode,1,12)) %>% length
And 419 cames from the GDC portal:
[image: Screenshot 2023-10-13 at 10.50.33 AM.png]
The difference should be the ones with files but no SNV.
- "TCGA-13-0908"
- "TCGA-13-0797"
- "TCGA-13-1409"
- "TCGA-13-0725"
- "TCGA-13-0758"
- "TCGA-13-0803"
- "TCGA-24-0975"
- "TCGA-25-1321"
- "TCGA-24-0979"
- "TCGA-09-0367"
- "TCGA-13-1410"
- "TCGA-23-1031"
…On Fri, Oct 13, 2023 at 10:27 AM Yasir Demirtaş ***@***.***> wrote:
@tiagochst <https://github.com/tiagochst> I still don't understand, Are
counts not suppose to be higher in the TCGA portal? Why it is higher in the
TCGAbiolinks results? Or another way to ask this question, how I can reach
the equal sample size in the TCGA portal?
—
Reply to this email directly, view it on GitHub
<#605 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABDQ6P5DNPF2WU2AT6O74DX7FFUZANCNFSM6AAAAAA55WOE7Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Thanks for your answer.
|
I was looking for the mutation data through TCGA portal using TCGAbiolinks and I have realized that sample size are not the same.
for instance TCGA-OV case TCGA data portal shows 419 cases, however TCGAbiolinks shows 462 samples. File counts are the same for both it is 482.
so why it is different?
this my query in TCGA data portal:
cases.project.project_id in ["TCGA-OV"] and files.analysis.workflow_type in ["Aliquot Ensemble Somatic Variant Merging and Masking"] and files.data_category in ["Simple Nucleotide Variation"] and files.data_type in ["Masked Somatic Mutation"]
this is same query in the TCGAbiolinks package:
The text was updated successfully, but these errors were encountered: