Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tempo-mpgen column names inconsistent with schema #1068

Open
anoronh4 opened this issue Jun 1, 2022 · 1 comment
Open

tempo-mpgen column names inconsistent with schema #1068

anoronh4 opened this issue Jun 1, 2022 · 1 comment

Comments

@anoronh4
Copy link

anoronh4 commented Jun 1, 2022

I have looked into a couple of column names and found that they change a bit from IGO to SMILE to tempo-mpgen and just wanted clarify and get it on your radar.

IGO SMILE tempo-mpgen python Voyager sample_tracker.txt Example value
cmoSampleName sampleType sample_class sampleType not shown Adjacent Tissue
specimenType sampleClass specimen_type sampleClass Sample_Class_(T/N) and sampleClass RapidAutopsy
tumorOrNormal tumorOrNormal tumorOrNormal tumorOrNormal tumorOrNormal Tumor

One issue with the first item is that this column is used in pairing to define whether a sample is normal or tumor, but it is not available in table form for downstream inspection or displaying in the tracker. tumorOrNormal is included in the sample_tracker.txt, but not used at all for pairing. This is creating some confusion with PMs when trying to debug.

I found the names used in tempo-mpgen code here:
https://github.com/mskcc/beagle/blob/master/runner/operator/tempo_mpgen_operator/bin/tempo_sample.py#L30-L31
Although voyager's names match the Schema v2.0, i found that using different names in beagle code made it more difficult to trace how samples are being organized with tempo-mpgen.

@allanbolipata
Copy link
Collaborator

Unfortunately, a lot of this is my fault - the field names across the different databases are so similar that it's become convoluted and lost a lot of meaning.

We can, however, re-map them to something more suitable for a future release. If you have a preference on column names and value mapping, let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants