Skip to content

Aggregate results of a custom model #2697

@oroojlooy

Description

@oroojlooy

I have four questions.

  1. Is there any automated way to aggregate results of a custom model to be used in a local leader-board?
    Right now in the result I have the results of validation and test chunks, for different languages. Was not sure if I only need to report the average stats among all languages, and assuming only for text. Can you please clarify?

  2. hen I download the result table via mteb.load_results(tasks=tasks), it gives me bunch of warnings like:

MTOPDomainClassification: Missing subsets {'en'} for split test
MassiveIntentClassification: Missing subsets {'en'} for split test
MassiveScenarioClassification: Missing subsets {'en'} for split test
MassiveIntentClassification: Missing subsets {'en'} for split test

Is this expected? If I want an apple to apple comparison, I assume I need to remove en results of test chunk for these datasets for my custom model as well, right?

  1. What are the metrics reported for each task-type/dataset? For example, for classification tasks, do you report F1 score or accuracy? And a similar question for other tasks like retrieval, summarization, etc.
    From the paper I found the following, but wanted to confirm:
{"BitextMining": "F1", "Classification": "accuracy", "Clustering": "v_measure",
                     "PairClassification": "cosine_ap", "Reranking": "map", "Retrieval": "ndcg_at_10",
                     "STS": "cosine_spearman", "Summarization": "cosine_spearman"}
  1. The results of the different models which are reported in the leaderboard for each dataset are different than reported results that can be downloaded viaresults = mteb.load_results(tasks=tasks). For example, for STS17 dataset, "google/gemini-embedding-exp-03-07" the leaderboard has 88.57 and the mteb.load_results(tasks=tasks) has 91.6. Any idea?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions