Produce specific similarity artifacts for Exomizer using custom counts #125

caufieldjh · 2024-04-29T15:36:53Z

Based off of #124

Establish a consistent way to count phenotype for MP and ZP
Produce Exomizer similarity artifact for:
- HP v HP
- HP v MP
- HP v ZP
Include some provenance and reproducibility data in the header of each, including semsimian version and the command(s) run to produce the file.

Each will need to incorporate the counts from HPOA (for HP) and Monarch's phenotype files (for MP and ZP) - or in the latter case from wherever Monarch is getting them.

caufieldjh · 2024-04-29T15:46:37Z

See the gene to phenotype tables here: https://data.monarchinitiative.org/latest/tsv/gene_associations/index.html

These appear to be identical to the owlsim tables already in use by Exomizer (see https://archive.monarchinitiative.org/latest/owlsim/data/Danio_rerio/Dr_gene_phenotype.txt)

justaddcoffee · 2024-04-29T15:47:42Z

To flesh this out a bit, the plan we discussed I think

start with our existing Jenkinsfile that runs semsimian
make each HP x HP, HP x MP, HP x ZP table separately using:
- HP phenotype counts from HPOA
- MP phenotype counts from IMPC and MGI
- ZP phenotype counts from gene_phenotype.7955.tsv from Monarch data website here I think? - put on KG-Hub s3 bucket somewhere coherent
finish implementing custom IC maps in semsimian

caufieldjh · 2024-04-29T15:54:25Z

IMPC phenotypes may not already be included in Monarch G2P tables, but Exomizer does use them.

caufieldjh · 2024-05-06T18:11:24Z

So to get this done, we will need to:

Decouple creation/updating of the IC map and closure map. They are currently stored separately (see

semsimian/src/lib.rs

Lines 68 to 78 in d66e20c

    
           pub struct RustSemsimian { 
        
               spo: Vec<(TermID, Predicate, TermID)>, 
        
               predicates: Option<Vec<Predicate>>, 
        
               ic_map: HashMap<PredicateSetKey, HashMap<TermID, f64>>, 
        
               // ic_map is something like {("is_a_+_part_of"), {"GO:1234": 1.234}} 
        
               closure_map: HashMap<PredicateSetKey, HashMap<TermID, HashSet<TermID>>>, 
        
               // closure_map is something like {("is_a_+_part_of"), {"GO:1234": {"GO:1234", "GO:5678"}}} 
        
               embeddings: Embeddings, 
        
               pairwise_similarity_attributes: Option<Vec<String>>, 
        
               prefix_expansion_cache: HashMap<TermID, HashMap<TermID, HashSet<TermID>>>, 
        
               max_ic_cache: HashMap<String, (HashSet<String>, f64)>,

) but updated at the same time.

Check for whether we have been provided with a custom IC map (as a filepath). If so, don't create a new one, but parse the provided one and use that instead.
Raise error if there is not alignment between the custom IC map and the closures.

caufieldjh · 2024-05-06T20:46:05Z

See also #47 - but for terms missing from IC map instead of closure map

caufieldjh · 2024-05-06T20:49:44Z

If the closure map and IC map do not contain each others' keys, raise an error.
We will work under the assumption that the user will provide inputs which are in alignment and contain all necessary IDs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Produce specific similarity artifacts for Exomizer using custom counts #125

Produce specific similarity artifacts for Exomizer using custom counts #125

caufieldjh commented Apr 29, 2024

caufieldjh commented Apr 29, 2024 •

edited

Loading

justaddcoffee commented Apr 29, 2024

caufieldjh commented Apr 29, 2024

caufieldjh commented May 6, 2024 •

edited

Loading

caufieldjh commented May 6, 2024

caufieldjh commented May 6, 2024

Produce specific similarity artifacts for Exomizer using custom counts #125

Produce specific similarity artifacts for Exomizer using custom counts #125

Comments

caufieldjh commented Apr 29, 2024

caufieldjh commented Apr 29, 2024 • edited Loading

justaddcoffee commented Apr 29, 2024

caufieldjh commented Apr 29, 2024

caufieldjh commented May 6, 2024 • edited Loading

caufieldjh commented May 6, 2024

caufieldjh commented May 6, 2024

caufieldjh commented Apr 29, 2024 •

edited

Loading

caufieldjh commented May 6, 2024 •

edited

Loading