Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EVAL] Add TUMLU benchmark #577

Open
gaydmi opened this issue Feb 19, 2025 · 10 comments
Open

[EVAL] Add TUMLU benchmark #577

gaydmi opened this issue Feb 19, 2025 · 10 comments
Labels

Comments

@gaydmi
Copy link
Contributor

gaydmi commented Feb 19, 2025

Hello!
We just released the benchmark for Turkic languages. Does it make sense if I add it to lighteval?

Evaluation short description

  • Why is this evaluation interesting?
    First native-language MMLU benchmark for low-resource Turkic languages.

  • How is it used in the community?
    Just released, MC high-school exam questions

Evaluation metadata

Provide all available

@clefourrier
Copy link
Member

cc @hynky1999 could interest you I feel!

@clefourrier
Copy link
Member

Is the dataset already on Hugging Face?

@gaydmi
Copy link
Contributor Author

gaydmi commented Feb 19, 2025

@clefourrier Not really (in gated repos), but everything is in github already.

@clefourrier
Copy link
Member

Gated sounds fine, can you share the path?

@hynky1999
Copy link
Collaborator

Hi, I think it would be very nice addition, we already have TurkishMMLU (which I think is is also part of your dataset right ?)

To add it we would need following:

  1. Have translation literals for the languages you want to add: (https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/multilingual/tasks.py#L2133)
  2. Add the dataset to hub
  3. Replace the TurkishMMLU with your dataset

Do you think you could do that? cc @gaydmi

@ceferisbarov
Copy link

@gaydmi Thank you for bringing this up!

@hynky1999 I have a question. Our dataset can be split into subsets in three ways: (a) make each language a subset, (b) make each subject a subset, (c) make each language-subject combination a subset. Which one would you suggest? I could not find any similar examples in the repo.

@gaydmi
Copy link
Contributor Author

gaydmi commented Feb 24, 2025

@hynky1999 Hi, yes, working on it!
@ceferisbarov I personally think option (c) is the best, so we could just add new languages with their tasks.
Like in here: https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/multilingual/tasks.py#L2617

@hynky1999
Copy link
Collaborator

I would say ideally use subset for languages and then add column to identify the actuall task subset. You can then use hf_filter arg on task

@ceferisbarov
Copy link

Both options sound good to me. I have added the dataset to Hugging Face:

https://huggingface.co/datasets/jafarisbarov/TUMLU-mini

@gaydmi let me know if I can help in any other way.

@hynky1999
Copy link
Collaborator

Awesome, cc @gaydmi happy to review the PR once ready

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants