Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible acceleration of similarity.rs #87

Open
hrshdhgd opened this issue Aug 11, 2023 · 0 comments
Open

Possible acceleration of similarity.rs #87

hrshdhgd opened this issue Aug 11, 2023 · 0 comments

Comments

@hrshdhgd
Copy link
Collaborator

Specifically

pub fn calculate_term_pairwise_information_content(
closure_map: &HashMap<PredicateSetKey, HashMap<TermID, HashSet<TermID>>>,
ic_map: &HashMap<PredicateSetKey, HashMap<TermID, f64>>,
entity1: &HashSet<TermID>,
entity2: &HashSet<TermID>,
predicates: &Option<Vec<Predicate>>,
) -> f64 {
// At each iteration, it calculates the IC score using the calculate_max_information_content function,
// and if the calculated IC score is greater than the current maximum IC (max_resnik_sim_e1_e2),
// it updates the maximum IC value. Thus, at the end of the iterations,
// max_resnik_sim_e1_e2 will contain the highest IC score among all the comparisons,
// representing the best match between entity1 and entity2.
let mut entity1_to_entity2_sum_resnik_sim = 0.0;
for e1_term in entity1.iter() {
let max_resnik_sim_e1_e2 = entity2
.iter()
.map(|e2_term| {
calculate_max_information_content(closure_map, ic_map, e1_term, e2_term, predicates)
})
.fold(0.0, |max_ic: f64, (_max_ic_ancestors, ic)| max_ic.max(ic));
entity1_to_entity2_sum_resnik_sim += max_resnik_sim_e1_e2;
}
// The final result will be the average Resnik similarity score between the two sets
entity1_to_entity2_sum_resnik_sim / entity1.len() as f64
}

Possible solution:

pub fn calculate_term_pairwise_information_content(
    closure_map: &HashMap<PredicateSetKey, HashMap<TermID, HashSet<TermID>>>,
    ic_map: &HashMap<PredicateSetKey, HashMap<TermID, f64>>,
    entity1: &HashSet<TermID>,
    entity2: &HashSet<TermID>,
    predicates: &Option<Vec<Predicate>>,
) -> f64 {
    let entity1_to_entity2_sum_resnik_sim: f64 = entity1.par_iter().map(|e1_term| {
        entity2.par_iter().map(|e2_term| {
            calculate_max_information_content(closure_map, ic_map, e1_term, e2_term, predicates)
        }).max_by(|a, b| a.partial_cmp(b).unwrap()).unwrap_or(0.0)
    }).sum();

    entity1_to_entity2_sum_resnik_sim / entity1.len() as f64
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant