Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem in new_tumor_event_after_initial_treatment in XML clinical data #58

Open
AlirezaShokrollahi opened this issue Sep 18, 2020 · 5 comments

Comments

@AlirezaShokrollahi
Copy link

AlirezaShokrollahi commented Sep 18, 2020

Hello
When I used TCGAbiolinks for downloading clinical data, I found a problem in XML clinical data.
In XML clinical data, There is a column that shows new_tumor_event_after_initial_treatment or new tumor event. For some TCGA projects like TCGA-BRCA, most values in this column are not available or unknown. But in files that we can download from GDC
directly, most values in this column are YES or NO.
could you help me please and explain what happened.
Thank you.

@tiagochst
Copy link
Contributor

Hello,

I need more details to understand the problem. GDC is populated over the XML files.

In XML clinical data, There is a column that shows new_tumor_event_after_initial_treatment or new tumor event.

Which XML file are you looking ?
There is a new_tumor_event XML with all the new tumor events information. For TCGA-BRCA there are 91 patients with new_tumor_event: https://rpubs.com/tiagochst/GDC_clinical_indexed_vs_XML

@AlirezaShokrollahi
Copy link
Author

AlirezaShokrollahi commented Sep 18, 2020

Hi
I write this code:
clinical <- GDCquery(project = "TCGA-BLCA",
data.category = "Clinical",
file.type = "xml"
)
GDCdownload(clinical)
clinicalf <- GDCprepare_clinic(clinical, clinical.info = "follow_up")

I look for follow up data in XML files.
In follow up data, There is a column that shows new_tumor_event_after_initial_treatment or new tumor event.

I download follow up data from GDC directly and send for you.
GDC

you can download follow up data by using TCGAbiolinks and then compare values of that column with follow up data that I sent for you.
gdc_download_20200918_165132.755004.tar.gz

@AlirezaShokrollahi
Copy link
Author

AlirezaShokrollahi commented Sep 18, 2020 via email

@tiagochst
Copy link
Contributor

I need to check better the problem with the XML parsing. But those biotab can be downloaded with TCGAbiolinks with.

query.biotab <- GDCquery(
    project = "TCGA-BRCA", 
    data.category = "Clinical",
    data.type = "Clinical Supplement", 
    data.format = "BCR Biotab")
GDCdownload(query.biotab)
clinical.BCRtab.all <- GDCprepare(query.biotab)

@AlirezaShokrollahi
Copy link
Author

Hi
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants