Skip to content

[Benchmark] Add MEGA-Bench #724

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Mar 28, 2025

Conversation

TianhaoLiang2000
Copy link
Contributor

@TianhaoLiang2000 TianhaoLiang2000 commented Jan 15, 2025

Hi, VLMEvalKit team,

This PR incorporates our recent work, MEGA-Bench, a multimodal evaluation suite with over 500 real-world tasks and 45 metrics.

The evaluation process involves two steps: 1) run VLMEvalKit to produce the response/submission file; 2) run our evaluator with 45 diverse metrics to get the scores.

Example usage:

python3 run.py
--data MEGABench_core_single_image_16frame
--model Qwen2-VL-7B-Instruct
--verbose
--work-dir ~/LMUData

@TianhaoLiang2000 TianhaoLiang2000 changed the title [Benchmark] Add MEGA-Bench core and core_single_image support [WIP][Benchmark] Add MEGA-Bench core and core_single_image support Jan 15, 2025
@TianhaoLiang2000 TianhaoLiang2000 changed the title [WIP][Benchmark] Add MEGA-Bench core and core_single_image support [Benchmark] Add MEGA-Bench Feb 21, 2025
@mary-0830
Copy link
Contributor

Hi, I would like to ask when this benchmark can be added?

@FangXinyu-0913
Copy link
Collaborator

Hi @mary-0830, @TianhaoLiang2000. I'm sorry for not getting back to you sooner. Due to the complexity of this dataset, we will conduct integration this week. Thank you for being so patient.

…ume evaluation, change data.zip download problem
@FangXinyu-0913 FangXinyu-0913 self-assigned this Mar 26, 2025
@FangXinyu-0913 FangXinyu-0913 merged commit c47c38e into open-compass:main Mar 28, 2025
7 checks passed
@mary-0830
Copy link
Contributor

mary-0830 commented Mar 28, 2025

Hi @TianhaoLiang2000 @FangXinyu-0913. I want to ask why the mega-benchmark (https://huggingface.co/datasets/TIGER-Lab/MEGA-Bench), which is a benchmark for image and text, why was it added to the video-vqa this time?

@FangXinyu-0913
Copy link
Collaborator

MEGA-Bench is a benchmark containing both the image (single-image and multiple-image) and video, so in order to be compatible with multiple modalities, it has been added to video-vqa. In the actual code processing, the input for video and single/multiple image inputs follow different logic to construct prompts, which does not affect the evaluation results.

@mary-0830
Copy link
Contributor

MEGA-Bench is a benchmark containing both the image (single-image and multiple-image) and video, so in order to be compatible with multiple modalities, it has been added to video-vqa. In the actual code processing, the input for video and single/multiple image inputs follow different logic to construct prompts, which does not affect the evaluation results.

嗨,谢谢你的回复。有一个疑问,如果我想在抽取一部分的子集进行测试,我应该如何修改呢?比如,我只选取了Information_Extraction中的App_Function_Understanding和Compound_Search_and_Calculate。

@mary-0830
Copy link
Contributor

MEGA-Bench is a benchmark containing both the image (single-image and multiple-image) and video, so in order to be compatible with multiple modalities, it has been added to video-vqa. In the actual code processing, the input for video and single/multiple image inputs follow different logic to construct prompts, which does not affect the evaluation results.

嗨,谢谢你的回复。有一个疑问,如果我想在抽取一部分的子集进行测试,我应该如何修改呢?比如,我只选取了Information_Extraction中的App_Function_Understanding和Compound_Search_and_Calculate。

嗨,这个问题我已经解决啦。现在又有一个疑问,为什么在每次测试时都需要对图片进行resized和rgba的转换呢,我发现在上一次执行的时候已经转换好了并保存。是不是可以加一个若在图片路径下存在了,就不需要执行这个转换了呢?

@FangXinyu-0913
Copy link
Collaborator

MEGA-Bench is a benchmark containing both the image (single-image and multiple-image) and video, so in order to be compatible with multiple modalities, it has been added to video-vqa. In the actual code processing, the input for video and single/multiple image inputs follow different logic to construct prompts, which does not affect the evaluation results.

嗨,谢谢你的回复。有一个疑问,如果我想在抽取一部分的子集进行测试,我应该如何修改呢?比如,我只选取了Information_Extraction中的App_Function_Understanding和Compound_Search_and_Calculate。

嗨,这个问题我已经解决啦。现在又有一个疑问,为什么在每次测试时都需要对图片进行resized和rgba的转换呢,我发现在上一次执行的时候已经转换好了并保存。是不是可以加一个若在图片路径下存在了,就不需要执行这个转换了呢?

Thank you for your proposal and support. You can modify it and submit a Pull request, and we will review it and merge it into the main branch.

Mercury7353 pushed a commit to Mercury7353/VLMEvalKit that referenced this pull request Apr 28, 2025
* add MEGA-Bench core dataset support

* add MEGA-Bench core dataset support

* add MEGA-Bench core dataset support

* add open-ended task

* merge upstream to main

* add README.md

* [Fix and Add Features] fix some bug in megabench, add support for resume evaluation, change data.zip download problem

* fix bugs of open_ended judge with eval_context

* fix bugs of open_ended judge with eval_context

* fix snapshot_download problem

* Update video_dataset_config.py

* fix import problem

* fix import problem

---------

Co-authored-by: Haodong Duan <[email protected]>
Co-authored-by: kennymckormick <[email protected]>
Co-authored-by: FangXinyu-0913 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants