You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
The new Metadata class #432 will need to be able to represent metadata in a variety of different formats. We need to be able to import any of these formats and switch between them at will.
Describe the solution you'd like
We can use subclasses to describe the formats:
Metadata (generic)
Qiime1
Qiime2
MMEDS Full
MMEDS Subject
MMEDS Specimen
LEfSe
SRA
In each level, there should be a function going 'up' (e.g. Qiime2 -> Metadata generic) and going down (e.g. Metadata generic -> MMEDS Full).
Converting to MMEDS
Converting to MMEDS from another format presents by far the biggest challenge. In MMEDS format, we have 5 header rows: Table Name, Var Name, Opt/Req, Format, Unit/Length Restriction. If we're trying to get, say, the MMEDS Var 'Weight' from a Qiime2 file, we need to be prepared for multiple situations for example:
Var name "weight" (lowercase)
Var name "mass" (something related)
Var name "Subject_Information_Pounds" (something vaguely but not simply related)
Var name "k" (essentially no information at all)
Then, once it is determined that the variable in question is indeed Weight, we also need to infer units. What if the data doesn't include any units at all and is purely numerical?
@circlespie and I discussed a solution that would use a two-tiered approach: a first pass using some kind of AI assistance, such as a word associative cluster that would be able to infer that a related word such as 'mass' implies the variable 'Weight'; then a fallback to check uncertainty with a user, asking 'Does 'Subanalysis' match 'SpecimenType'? y/n'.
Alternatively, a user could provide as supplementary input a dictionary explicitly defining what each label mapped to. However, this would require user preprocessing, the very thing we're attempting to avoid. Further discussion on this issue is warranted.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
The new Metadata class #432 will need to be able to represent metadata in a variety of different formats. We need to be able to import any of these formats and switch between them at will.
Describe the solution you'd like
We can use subclasses to describe the formats:
In each level, there should be a function going 'up' (e.g. Qiime2 -> Metadata generic) and going down (e.g. Metadata generic -> MMEDS Full).
Converting to MMEDS
Converting to MMEDS from another format presents by far the biggest challenge. In MMEDS format, we have 5 header rows: Table Name, Var Name, Opt/Req, Format, Unit/Length Restriction. If we're trying to get, say, the MMEDS Var 'Weight' from a Qiime2 file, we need to be prepared for multiple situations for example:
Then, once it is determined that the variable in question is indeed Weight, we also need to infer units. What if the data doesn't include any units at all and is purely numerical?
@circlespie and I discussed a solution that would use a two-tiered approach: a first pass using some kind of AI assistance, such as a word associative cluster that would be able to infer that a related word such as 'mass' implies the variable 'Weight'; then a fallback to check uncertainty with a user, asking 'Does 'Subanalysis' match 'SpecimenType'? y/n'.
Alternatively, a user could provide as supplementary input a dictionary explicitly defining what each label mapped to. However, this would require user preprocessing, the very thing we're attempting to avoid. Further discussion on this issue is warranted.
The text was updated successfully, but these errors were encountered: