tempo-mpgen column names inconsistent with schema #1068

anoronh4 · 2022-06-01T21:38:12Z

I have looked into a couple of column names and found that they change a bit from IGO to SMILE to tempo-mpgen and just wanted clarify and get it on your radar.

IGO	SMILE	tempo-mpgen python	Voyager	`sample_tracker.txt`	Example value
`cmoSampleName`	`sampleType`	`sample_class`	`sampleType`	not shown	`Adjacent Tissue`
`specimenType`	`sampleClass`	`specimen_type`	`sampleClass`	`Sample_Class_(T/N)` and `sampleClass`	`RapidAutopsy`
`tumorOrNormal`	`tumorOrNormal`	`tumorOrNormal`	`tumorOrNormal`	`tumorOrNormal`	`Tumor`

One issue with the first item is that this column is used in pairing to define whether a sample is normal or tumor, but it is not available in table form for downstream inspection or displaying in the tracker. tumorOrNormal is included in the sample_tracker.txt, but not used at all for pairing. This is creating some confusion with PMs when trying to debug.

I found the names used in tempo-mpgen code here:
https://github.com/mskcc/beagle/blob/master/runner/operator/tempo_mpgen_operator/bin/tempo_sample.py#L30-L31
Although voyager's names match the Schema v2.0, i found that using different names in beagle code made it more difficult to trace how samples are being organized with tempo-mpgen.

The text was updated successfully, but these errors were encountered:

allanbolipata · 2022-06-02T09:18:28Z

Unfortunately, a lot of this is my fault - the field names across the different databases are so similar that it's become convoluted and lost a lot of meaning.

We can, however, re-map them to something more suitable for a future release. If you have a preference on column names and value mapping, let us know.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tempo-mpgen column names inconsistent with schema #1068

tempo-mpgen column names inconsistent with schema #1068

anoronh4 commented Jun 1, 2022 •

edited

Loading

allanbolipata commented Jun 2, 2022

tempo-mpgen column names inconsistent with schema #1068

tempo-mpgen column names inconsistent with schema #1068

Comments

anoronh4 commented Jun 1, 2022 • edited Loading

allanbolipata commented Jun 2, 2022

anoronh4 commented Jun 1, 2022 •

edited

Loading