Add MEGA-Bench #496

woodfrog · 2025-01-12T10:00:56Z

Hi, lmms-eval team,

This PR incorporates our recent work, MEGA-Bench, a multimodal evaluation suite with over 500 real-world tasks and 45 metrics.

The evaluation process involves two steps: 1) run lmms-eval to produce the response/submission file; 2) run our evaluator with 45 diverse metrics to get the scores and multi-dimensional breakdown results.

Example response/submission file generation:

# Core set (440 tasks)
python3 -m accelerate.commands.launch \
    --num_processes=8 \
    -m lmms_eval \
    --model llava_onevision \
    --tasks megabench_core  \
    --batch_size 1 \
    --log_samples \
    --log_samples_suffix llava_ov_megabench_core \
    --output_path ./logs/ \
    --model_args=pretrained=lmms-lab/llava-onevision-qwen2-7b-ov,conv_template=qwen_1_5,model_name=llava_qwen

Detailed steps are recorded in this README file.

Please advise on the format of the README or other details, thanks!

woodfrog and others added 7 commits January 8, 2025 00:45

init megabench query-response and submission generation

ed79666

submission creation & core set evaluation script

aff0acd

megabench evaluator for open set w/ eval resuming

68cbdf2

finish megabench score generation, update readme

8c95798

update megabench score generation readme

42c67b7

update megabench readme, reorg folder structure

0ff7716

Update README.md

e68ff27

Luodian approved these changes Jan 13, 2025

View reviewed changes

Luodian merged commit 4bb2f27 into EvolvingLMMs-Lab:main Jan 13, 2025
1 check failed

woodfrog changed the title ~~[WIP] Add MEGA-Bench~~ Add MEGA-Bench Jan 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MEGA-Bench #496

Add MEGA-Bench #496

woodfrog commented Jan 12, 2025 •

edited

Loading

Add MEGA-Bench #496

Add MEGA-Bench #496

Conversation

woodfrog commented Jan 12, 2025 • edited Loading

woodfrog commented Jan 12, 2025 •

edited

Loading