Aggregate results of a custom model

I have four questions. 
1. Is there any automated way to aggregate results of a custom model to be used in a local leader-board? 
Right now in the result I have the results of validation and test chunks, for different languages. Was not sure if I only need to report the average stats among all languages, and assuming only for `text`. Can you please clarify? 

2. hen I download the result table via `mteb.load_results(tasks=tasks)`, it gives me bunch of warnings like:
```
MTOPDomainClassification: Missing subsets {'en'} for split test
MassiveIntentClassification: Missing subsets {'en'} for split test
MassiveScenarioClassification: Missing subsets {'en'} for split test
MassiveIntentClassification: Missing subsets {'en'} for split test
```
Is this expected? If I want an apple to apple comparison, I assume I need to remove `en` results of `test` chunk for these datasets for my custom model as well, right? 

3. What are the metrics reported for each task-type/dataset? For example, for classification tasks, do you report F1 score or accuracy? And a similar question for other tasks like retrieval, summarization, etc. 
From the paper I found the following, but wanted to confirm:
```
{"BitextMining": "F1", "Classification": "accuracy", "Clustering": "v_measure",
                     "PairClassification": "cosine_ap", "Reranking": "map", "Retrieval": "ndcg_at_10",
                     "STS": "cosine_spearman", "Summarization": "cosine_spearman"}
```
4. The results of the different models which are reported in the leaderboard for each dataset are different than reported results that can be downloaded via`results = mteb.load_results(tasks=tasks)`. For example, for `STS17` dataset, `"google/gemini-embedding-exp-03-07"` the leaderboard has `88.57` and the `mteb.load_results(tasks=tasks)` has `91.6`. Any idea? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Aggregate results of a custom model #2697

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Aggregate results of a custom model #2697

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions