[Benchmark] Add MEGA-Bench #724

TianhaoLiang2000 · 2025-01-15T17:33:14Z

Hi, VLMEvalKit team,

This PR incorporates our recent work, MEGA-Bench, a multimodal evaluation suite with over 500 real-world tasks and 45 metrics.

The evaluation process involves two steps: 1) run VLMEvalKit to produce the response/submission file; 2) run our evaluator with 45 diverse metrics to get the scores.

Example usage:

python3 run.py
--data MEGABench_core_single_image_16frame
--model Qwen2-VL-7B-Instruct
--verbose
--work-dir ~/LMUData

mary-0830 · 2025-03-11T07:13:07Z

Hi, I would like to ask when this benchmark can be added?

FangXinyu-0913 · 2025-03-25T03:11:11Z

Hi @mary-0830, @TianhaoLiang2000. I'm sorry for not getting back to you sooner. Due to the complexity of this dataset, we will conduct integration this week. Thank you for being so patient.

…ume evaluation, change data.zip download problem

mary-0830 · 2025-03-28T06:01:28Z

Hi @TianhaoLiang2000 @FangXinyu-0913. I want to ask why the mega-benchmark (https://huggingface.co/datasets/TIGER-Lab/MEGA-Bench), which is a benchmark for image and text, why was it added to the video-vqa this time?

FangXinyu-0913 · 2025-03-28T07:18:57Z

MEGA-Bench is a benchmark containing both the image (single-image and multiple-image) and video, so in order to be compatible with multiple modalities, it has been added to video-vqa. In the actual code processing, the input for video and single/multiple image inputs follow different logic to construct prompts, which does not affect the evaluation results.

mary-0830 · 2025-03-28T10:20:43Z

MEGA-Bench is a benchmark containing both the image (single-image and multiple-image) and video, so in order to be compatible with multiple modalities, it has been added to video-vqa. In the actual code processing, the input for video and single/multiple image inputs follow different logic to construct prompts, which does not affect the evaluation results.

嗨，谢谢你的回复。有一个疑问，如果我想在抽取一部分的子集进行测试，我应该如何修改呢？比如，我只选取了Information_Extraction中的App_Function_Understanding和Compound_Search_and_Calculate。

mary-0830 · 2025-04-02T06:17:53Z

MEGA-Bench is a benchmark containing both the image (single-image and multiple-image) and video, so in order to be compatible with multiple modalities, it has been added to video-vqa. In the actual code processing, the input for video and single/multiple image inputs follow different logic to construct prompts, which does not affect the evaluation results.

嗨，谢谢你的回复。有一个疑问，如果我想在抽取一部分的子集进行测试，我应该如何修改呢？比如，我只选取了Information_Extraction中的App_Function_Understanding和Compound_Search_and_Calculate。

嗨，这个问题我已经解决啦。现在又有一个疑问，为什么在每次测试时都需要对图片进行resized和rgba的转换呢，我发现在上一次执行的时候已经转换好了并保存。是不是可以加一个若在图片路径下存在了，就不需要执行这个转换了呢？

FangXinyu-0913 · 2025-04-02T06:33:54Z

MEGA-Bench is a benchmark containing both the image (single-image and multiple-image) and video, so in order to be compatible with multiple modalities, it has been added to video-vqa. In the actual code processing, the input for video and single/multiple image inputs follow different logic to construct prompts, which does not affect the evaluation results.

嗨，谢谢你的回复。有一个疑问，如果我想在抽取一部分的子集进行测试，我应该如何修改呢？比如，我只选取了Information_Extraction中的App_Function_Understanding和Compound_Search_and_Calculate。

嗨，这个问题我已经解决啦。现在又有一个疑问，为什么在每次测试时都需要对图片进行resized和rgba的转换呢，我发现在上一次执行的时候已经转换好了并保存。是不是可以加一个若在图片路径下存在了，就不需要执行这个转换了呢？

Thank you for your proposal and support. You can modify it and submit a Pull request, and we will review it and merge it into the main branch.

* add MEGA-Bench core dataset support * add MEGA-Bench core dataset support * add MEGA-Bench core dataset support * add open-ended task * merge upstream to main * add README.md * [Fix and Add Features] fix some bug in megabench, add support for resume evaluation, change data.zip download problem * fix bugs of open_ended judge with eval_context * fix bugs of open_ended judge with eval_context * fix snapshot_download problem * Update video_dataset_config.py * fix import problem * fix import problem --------- Co-authored-by: Haodong Duan <[email protected]> Co-authored-by: kennymckormick <[email protected]> Co-authored-by: FangXinyu-0913 <[email protected]>

TianhaoLiang2000 added 3 commits January 15, 2025 12:21

add MEGA-Bench core dataset support

298a8fa

add MEGA-Bench core dataset support

39fed6c

add MEGA-Bench core dataset support

512a821

TianhaoLiang2000 closed this Jan 15, 2025

TianhaoLiang2000 reopened this Jan 15, 2025

TianhaoLiang2000 changed the title ~~[Benchmark] Add MEGA-Bench core and core_single_image support~~ [WIP][Benchmark] Add MEGA-Bench core and core_single_image support Jan 15, 2025

TianhaoLiang2000 added 3 commits February 20, 2025 16:42

add open-ended task

d6758cc

merge upstream to main

65d0799

merge upstream to main

a9cd797

TianhaoLiang2000 changed the title ~~[WIP][Benchmark] Add MEGA-Bench core and core_single_image support~~ [Benchmark] Add MEGA-Bench Feb 21, 2025

kennymckormick and others added 2 commits February 26, 2025 16:40

Merge branch 'main' into megabench

4348a78

add README.md

942dbba

[Fix and Add Features] fix some bug in megabench, add support for res…

d661e8f

…ume evaluation, change data.zip download problem

FangXinyu-0913 self-assigned this Mar 26, 2025

TianhaoLiang2000 and others added 7 commits March 27, 2025 01:18

fix bugs of open_ended judge with eval_context

ece37d3

fix bugs of open_ended judge with eval_context

8f153f4

fix snapshot_download problem

1d396b3

Merge branch 'main' into megabench

8ccd2cd

Update video_dataset_config.py

dc04743

fix import problem

4924a0b

fix import problem

b571d36

FangXinyu-0913 merged commit c47c38e into open-compass:main Mar 28, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Benchmark] Add MEGA-Bench #724

[Benchmark] Add MEGA-Bench #724

Uh oh!

TianhaoLiang2000 commented Jan 15, 2025 •

edited

Loading

Uh oh!

mary-0830 commented Mar 11, 2025

Uh oh!

FangXinyu-0913 commented Mar 25, 2025

Uh oh!

Uh oh!

mary-0830 commented Mar 28, 2025 •

edited

Loading

Uh oh!

FangXinyu-0913 commented Mar 28, 2025

Uh oh!

mary-0830 commented Mar 28, 2025

Uh oh!

mary-0830 commented Apr 2, 2025

Uh oh!

FangXinyu-0913 commented Apr 2, 2025

Uh oh!

Uh oh!

[Benchmark] Add MEGA-Bench #724

[Benchmark] Add MEGA-Bench #724

Uh oh!

Conversation

TianhaoLiang2000 commented Jan 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mary-0830 commented Mar 11, 2025

Uh oh!

FangXinyu-0913 commented Mar 25, 2025

Uh oh!

Uh oh!

mary-0830 commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FangXinyu-0913 commented Mar 28, 2025

Uh oh!

mary-0830 commented Mar 28, 2025

Uh oh!

mary-0830 commented Apr 2, 2025

Uh oh!

FangXinyu-0913 commented Apr 2, 2025

Uh oh!

Uh oh!

TianhaoLiang2000 commented Jan 15, 2025 •

edited

Loading

mary-0830 commented Mar 28, 2025 •

edited

Loading