add_ocrbench #28

echo840 · 2024-03-24T10:05:43Z

Before you open a pull-request, please check if a similar issue already exists or has been closed before.

When you open a pull-request, please be sure to include the following

A descriptive title: [xxx] XXXX
A detailed description

Thank you for your contributions!

Add the evaluation of OCRBench.

Luodian · 2024-03-24T16:38:58Z

@echo840 Thanks for commiting to lmms-eval~We are going to check it soon!

* add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa

* Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]>

* Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]>

* Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0390783 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 407bc25 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Refactor CLI evaluate function and improve error logging --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]>

* Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 1e2ae93 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 10bbaf0 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Refactor CLI evaluate function and improve error logging --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]>

* Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0390783 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 407bc25 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]>

* Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 1e2ae93 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 10bbaf0 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]>

* add fuyu * Merge commit 'c7ffa8dee96e228c6519154d5a00742b35caa3f2' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit c7ffa8d Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0390783 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 407bc25 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file

* add fuyu * Merge commit '0f183a394426d3bf88884b4e2258ab53406bc705' * Squashed commit of the following: commit b81ed2ce4d0e226df7a41bddd82fe1f9d46a27fc Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 0f183a3 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 1e2ae93 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 10bbaf0 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file

…function (#35) * add fuyu * Merge commit 'c7ffa8dee96e228c6519154d5a00742b35caa3f2' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit c7ffa8d Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0390783 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 407bc25 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit d0c8c61 Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit f4fd4fd Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit 'c7ffa8dee96e228c6519154d5a00742b35caa3f2' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit c7ffa8d Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0390783 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 407bc25 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit c7ffa8d Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0390783 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 407bc25 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets

…function (#35) * add fuyu * Merge commit '0f183a394426d3bf88884b4e2258ab53406bc705' * Squashed commit of the following: commit b81ed2ce4d0e226df7a41bddd82fe1f9d46a27fc Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 0f183a3 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 1e2ae93 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 10bbaf0 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 4b604e7 Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit 799a6bc Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit '0f183a394426d3bf88884b4e2258ab53406bc705' * Squashed commit of the following: commit b81ed2ce4d0e226df7a41bddd82fe1f9d46a27fc Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 0f183a3 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 1e2ae93 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 10bbaf0 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit 0f183a3 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 1e2ae93 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 10bbaf0 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets

* add fuyu * Merge commit 'c7ffa8dee96e228c6519154d5a00742b35caa3f2' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit c7ffa8dee96e228c6519154d5a00742b35caa3f2 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0390783595c41232352599ab78fbe5949615e982 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 407bc2500c162d8949fbaae3d11d522afd2c9f28 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit d0c8c61d9a23686d31c7e014f0c15d802e04ee61 Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit f4fd4fd29b45436a96fe65395f0922612f598052 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit 'c7ffa8dee96e228c6519154d5a00742b35caa3f2' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit c7ffa8dee96e228c6519154d5a00742b35caa3f2 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0390783595c41232352599ab78fbe5949615e982 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 407bc2500c162d8949fbaae3d11d522afd2c9f28 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit c7ffa8dee96e228c6519154d5a00742b35caa3f2 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0390783595c41232352599ab78fbe5949615e982 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 407bc2500c162d8949fbaae3d11d522afd2c9f28 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets * Refactor directory structure for RefCOCO+ and RefCOCOg datasets * Fix error logging in get_eval and parse_score functions * Update .gitignore and mme.yaml * Squashed commit of the following: commit 380494bb2417fae1bcc1535ad8b67df7af667619 Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:43:28 2024 +0800 black commit e46b937aeeed45f5dd574b852459bfb416d165fd Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:42:03 2024 +0800 adapt qwen to sqa, gqa, ai2d, docvqa commit 2da8f918c37495b3447b9c24e74234ad0bba8cbf Author: Li Bo <[email protected]> Date: Thu Feb 1 16:20:27 2024 +0800 [Dataset] fix hallusion benchmark, add saving logic inside aggregate function (#35) * add fuyu * Merge commit 'c7ffa8dee96e228c6519154d5a00742b35caa3f2' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit c7ffa8dee96e228c6519154d5a00742b35caa3f2 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0390783595c41232352599ab78fbe5949615e982 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 407bc2500c162d8949fbaae3d11d522afd2c9f28 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit d0c8c61d9a23686d31c7e014f0c15d802e04ee61 Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit f4fd4fd29b45436a96fe65395f0922612f598052 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit 'c7ffa8dee96e228c6519154d5a00742b35caa3f2' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit c7ffa8dee96e228c6519154d5a00742b35caa3f2 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0390783595c41232352599ab78fbe5949615e982 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 407bc2500c162d8949fbaae3d11d522afd2c9f28 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- …

* add fuyu * Merge commit '0f183a394426d3bf88884b4e2258ab53406bc705' * Squashed commit of the following: commit b81ed2ce4d0e226df7a41bddd82fe1f9d46a27fc Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 0f183a394426d3bf88884b4e2258ab53406bc705 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 1e2ae936c90a15d684926e43a38aac86935f38c5 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 10bbaf01c0a4164b6f1d2628367befccf8f39c24 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 4b604e75cfde49df52e4abd90be4876ed9a1b08f Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit 799a6bcb9033656115755c5169f8c342eb927d54 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit '0f183a394426d3bf88884b4e2258ab53406bc705' * Squashed commit of the following: commit b81ed2ce4d0e226df7a41bddd82fe1f9d46a27fc Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 0f183a394426d3bf88884b4e2258ab53406bc705 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 1e2ae936c90a15d684926e43a38aac86935f38c5 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 10bbaf01c0a4164b6f1d2628367befccf8f39c24 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit 0f183a394426d3bf88884b4e2258ab53406bc705 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 1e2ae936c90a15d684926e43a38aac86935f38c5 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 10bbaf01c0a4164b6f1d2628367befccf8f39c24 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets * Refactor directory structure for RefCOCO+ and RefCOCOg datasets * Fix error logging in get_eval and parse_score functions * Update .gitignore and mme.yaml * Squashed commit of the following: commit 3d44977c9254d1ee5254b2ca24c8cc54984e84b0 Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:43:28 2024 +0800 black commit a38ffeb692fbeb9deebe20f65b0f3e041823e695 Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:42:03 2024 +0800 adapt qwen to sqa, gqa, ai2d, docvqa commit e24607fd5725aabb7f6db5fa457b5e6a5123c199 Author: Li Bo <[email protected]> Date: Thu Feb 1 16:20:27 2024 +0800 [Dataset] fix hallusion benchmark, add saving logic inside aggregate function (#35) * add fuyu * Merge commit '0f183a394426d3bf88884b4e2258ab53406bc705' * Squashed commit of the following: commit b81ed2ce4d0e226df7a41bddd82fe1f9d46a27fc Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 0f183a394426d3bf88884b4e2258ab53406bc705 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 1e2ae936c90a15d684926e43a38aac86935f38c5 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 10bbaf01c0a4164b6f1d2628367befccf8f39c24 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 4b604e75cfde49df52e4abd90be4876ed9a1b08f Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit 799a6bcb9033656115755c5169f8c342eb927d54 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit '0f183a394426d3bf88884b4e2258ab53406bc705' * Squashed commit of the following: commit b81ed2ce4d0e226df7a41bddd82fe1f9d46a27fc Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 0f183a394426d3bf88884b4e2258ab53406bc705 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 1e2ae936c90a15d684926e43a38aac86935f38c5 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 10bbaf01c0a4164b6f1d2628367befccf8f39c24 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- …

* add fuyu * Merge commit 'c7ffa8dee96e228c6519154d5a00742b35caa3f2' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit c7ffa8dee96e228c6519154d5a00742b35caa3f2 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0390783595c41232352599ab78fbe5949615e982 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 407bc2500c162d8949fbaae3d11d522afd2c9f28 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit d0c8c61d9a23686d31c7e014f0c15d802e04ee61 Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit f4fd4fd29b45436a96fe65395f0922612f598052 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit 'c7ffa8dee96e228c6519154d5a00742b35caa3f2' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit c7ffa8dee96e228c6519154d5a00742b35caa3f2 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0390783595c41232352599ab78fbe5949615e982 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 407bc2500c162d8949fbaae3d11d522afd2c9f28 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit c7ffa8dee96e228c6519154d5a00742b35caa3f2 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0390783595c41232352599ab78fbe5949615e982 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 407bc2500c162d8949fbaae3d11d522afd2c9f28 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets * Refactor directory structure for RefCOCO+ and RefCOCOg datasets * Fix error logging in get_eval and parse_score functions * Update .gitignore and mme.yaml * Squashed commit of the following: commit 380494bb2417fae1bcc1535ad8b67df7af667619 Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:43:28 2024 +0800 black commit e46b937aeeed45f5dd574b852459bfb416d165fd Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:42:03 2024 +0800 adapt qwen to sqa, gqa, ai2d, docvqa commit 2da8f918c37495b3447b9c24e74234ad0bba8cbf Author: Li Bo <[email protected]> Date: Thu Feb 1 16:20:27 2024 +0800 [Dataset] fix hallusion benchmark, add saving logic inside aggregate function (#35) * add fuyu * Merge commit 'c7ffa8dee96e228c6519154d5a00742b35caa3f2' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit c7ffa8dee96e228c6519154d5a00742b35caa3f2 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0390783595c41232352599ab78fbe5949615e982 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 407bc2500c162d8949fbaae3d11d522afd2c9f28 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit d0c8c61d9a23686d31c7e014f0c15d802e04ee61 Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit f4fd4fd29b45436a96fe65395f0922612f598052 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit 'c7ffa8dee96e228c6519154d5a00742b35caa3f2' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit c7ffa8dee96e228c6519154d5a00742b35caa3f2 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0390783595c41232352599ab78fbe5949615e982 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 407bc2500c162d8949fbaae3d11d522afd2c9f28 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f80465fd0f30781c8c36b46c1d6d7bba751f9e33' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 2df0ce76ef836be1cb8ffbf3c854fe05563647b0 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit af6c7a2b8c2959495dc351e6f6eb2a442efe4e94 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 26da729c40008f72ce3f10c932874f120f290e26 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit acbb1a1997c5159709e3b81c3f0292b2f9def109 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit b33ac32f0ff28777204eaaf27a963200024081df Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f80465fd0f30781c8c36b46c1d6d7bba751f9e33 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co…

* add fuyu * Merge commit '0f183a394426d3bf88884b4e2258ab53406bc705' * Squashed commit of the following: commit b81ed2ce4d0e226df7a41bddd82fe1f9d46a27fc Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 0f183a394426d3bf88884b4e2258ab53406bc705 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 1e2ae936c90a15d684926e43a38aac86935f38c5 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 10bbaf01c0a4164b6f1d2628367befccf8f39c24 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 4b604e75cfde49df52e4abd90be4876ed9a1b08f Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit 799a6bcb9033656115755c5169f8c342eb927d54 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit '0f183a394426d3bf88884b4e2258ab53406bc705' * Squashed commit of the following: commit b81ed2ce4d0e226df7a41bddd82fe1f9d46a27fc Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 0f183a394426d3bf88884b4e2258ab53406bc705 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 1e2ae936c90a15d684926e43a38aac86935f38c5 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 10bbaf01c0a4164b6f1d2628367befccf8f39c24 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit 0f183a394426d3bf88884b4e2258ab53406bc705 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 1e2ae936c90a15d684926e43a38aac86935f38c5 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 10bbaf01c0a4164b6f1d2628367befccf8f39c24 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets * Refactor directory structure for RefCOCO+ and RefCOCOg datasets * Fix error logging in get_eval and parse_score functions * Update .gitignore and mme.yaml * Squashed commit of the following: commit 3d44977c9254d1ee5254b2ca24c8cc54984e84b0 Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:43:28 2024 +0800 black commit a38ffeb692fbeb9deebe20f65b0f3e041823e695 Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:42:03 2024 +0800 adapt qwen to sqa, gqa, ai2d, docvqa commit e24607fd5725aabb7f6db5fa457b5e6a5123c199 Author: Li Bo <[email protected]> Date: Thu Feb 1 16:20:27 2024 +0800 [Dataset] fix hallusion benchmark, add saving logic inside aggregate function (#35) * add fuyu * Merge commit '0f183a394426d3bf88884b4e2258ab53406bc705' * Squashed commit of the following: commit b81ed2ce4d0e226df7a41bddd82fe1f9d46a27fc Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 0f183a394426d3bf88884b4e2258ab53406bc705 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 1e2ae936c90a15d684926e43a38aac86935f38c5 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 10bbaf01c0a4164b6f1d2628367befccf8f39c24 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 4b604e75cfde49df52e4abd90be4876ed9a1b08f Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit 799a6bcb9033656115755c5169f8c342eb927d54 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit '0f183a394426d3bf88884b4e2258ab53406bc705' * Squashed commit of the following: commit b81ed2ce4d0e226df7a41bddd82fe1f9d46a27fc Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 0f183a394426d3bf88884b4e2258ab53406bc705 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 1e2ae936c90a15d684926e43a38aac86935f38c5 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 10bbaf01c0a4164b6f1d2628367befccf8f39c24 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6f5e17c70a50983c83a132edf0fdcd98de Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co…

* add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa

…lvingLMMs-Lab#33) * add fuyu * Merge commit 'c2050a435b47dfba638b6ba6a1600515a9f61b4c' * Squashed commit of the following: commit 55411a8236a6a4af45c9d3d73349d9308f1b11dd Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit c2050a4 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 27ab736 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit 1c11ae4 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file

…lvingLMMs-Lab#33) * add fuyu * Merge commit '49e625761a6853595641a0a411c96168490dabad' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 49e6257 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811fac Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811fac Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit da7a8df Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit a853223 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811fac Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811fac Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file

…lvingLMMs-Lab#33) * add fuyu * Merge commit '6d570ac1d98a03585c8119ccb362e13ab2172fed' * Squashed commit of the following: commit 09c64b7491cd19d4e6c4a6e1a38254eaa74d0032 Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 6d570ac Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit fbb7aa5 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit b8ba33c Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file

…lvingLMMs-Lab#33) * add fuyu * Merge commit 'd8a4f8ef094e37c987863da971cbc51637b92b43' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit d8a4f8e Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit a2b4a2a Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit 47a6675 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file

…lvingLMMs-Lab#33) * add fuyu * Merge commit 'ebe4eb8dffcce06f7be393478d35d76de82a3836' * Squashed commit of the following: commit 72ce63c90098fa7a7364f7a1113ce4b3b23b981a Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit ebe4eb8 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d10 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d10 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0fd4558 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit f125889 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d10 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d10 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file

…function (EvolvingLMMs-Lab#35) * add fuyu * Merge commit '7b7f6368e8e04cddbd6e7f572f1099b7911cbe04' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 7b7f636 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'ad8d9da1fb40c446202bf9b0095b02262df2ffc8' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit dbba2fe6447b0dfd4bb89a368f62178f2b253006 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 6ee856b Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit 4a1183c Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'ad8d9da1fb40c446202bf9b0095b02262df2ffc8' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit dbba2fe6447b0dfd4bb89a368f62178f2b253006 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 7664839 Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (EvolvingLMMs-Lab#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit 05487a4 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (EvolvingLMMs-Lab#33) * add fuyu * Merge commit '7b7f6368e8e04cddbd6e7f572f1099b7911cbe04' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 7b7f636 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'ad8d9da1fb40c446202bf9b0095b02262df2ffc8' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit dbba2fe6447b0dfd4bb89a368f62178f2b253006 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 6ee856b Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit 4a1183c Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'ad8d9da1fb40c446202bf9b0095b02262df2ffc8' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit dbba2fe6447b0dfd4bb89a368f62178f2b253006 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit 7b7f636 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'ad8d9da1fb40c446202bf9b0095b02262df2ffc8' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit dbba2fe6447b0dfd4bb89a368f62178f2b253006 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 6ee856b Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit 4a1183c Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'ad8d9da1fb40c446202bf9b0095b02262df2ffc8' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit dbba2fe6447b0dfd4bb89a368f62178f2b253006 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets

…function (EvolvingLMMs-Lab#35) * add fuyu * Merge commit '1c9c7f95a6b03950c05f47216c7dbf4c4d3edd29' * Squashed commit of the following: commit 09c64b7491cd19d4e6c4a6e1a38254eaa74d0032 Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 1c9c7f9 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0dc9a47afe9a61214f11053dae5641716052f30f' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 9d06741 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit 0d4e69f Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0dc9a47afe9a61214f11053dae5641716052f30f' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 2b01738 Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (EvolvingLMMs-Lab#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit 2f61ad5 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (EvolvingLMMs-Lab#33) * add fuyu * Merge commit '1c9c7f95a6b03950c05f47216c7dbf4c4d3edd29' * Squashed commit of the following: commit 09c64b7491cd19d4e6c4a6e1a38254eaa74d0032 Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 1c9c7f9 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0dc9a47afe9a61214f11053dae5641716052f30f' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 9d06741 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit 0d4e69f Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0dc9a47afe9a61214f11053dae5641716052f30f' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit 1c9c7f9 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0dc9a47afe9a61214f11053dae5641716052f30f' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 9d06741 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit 0d4e69f Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0dc9a47afe9a61214f11053dae5641716052f30f' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets

…function (EvolvingLMMs-Lab#35) * add fuyu * Merge commit '708de71d7c634c51ade4443f7a8590dca74561ed' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 708de71 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit e19ec39 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit 1c5dbd5 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit af73a51 Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (EvolvingLMMs-Lab#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit accfaff Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (EvolvingLMMs-Lab#33) * add fuyu * Merge commit '708de71d7c634c51ade4443f7a8590dca74561ed' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 708de71 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit e19ec39 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit 1c5dbd5 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit 708de71 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit e19ec39 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit 1c5dbd5 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets

…function (EvolvingLMMs-Lab#35) * add fuyu * Merge commit 'c2050a435b47dfba638b6ba6a1600515a9f61b4c' * Squashed commit of the following: commit 55411a8236a6a4af45c9d3d73349d9308f1b11dd Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit c2050a4 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 27ab736 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit 1c11ae4 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit c37504a Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (EvolvingLMMs-Lab#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit cb7b75e Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (EvolvingLMMs-Lab#33) * add fuyu * Merge commit 'c2050a435b47dfba638b6ba6a1600515a9f61b4c' * Squashed commit of the following: commit 55411a8236a6a4af45c9d3d73349d9308f1b11dd Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit c2050a4 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 27ab736 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit 1c11ae4 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit c2050a4 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 27ab736 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit 1c11ae4 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets

…function (EvolvingLMMs-Lab#35) * add fuyu * Merge commit '49e625761a6853595641a0a411c96168490dabad' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 49e6257 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811fac Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811fac Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit da7a8df Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit a853223 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811fac Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811fac Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 6e7cd87 Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (EvolvingLMMs-Lab#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit efd3510 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (EvolvingLMMs-Lab#33) * add fuyu * Merge commit '49e625761a6853595641a0a411c96168490dabad' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 49e6257 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811fac Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811fac Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit da7a8df Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit a853223 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811fac Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811fac Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit 49e6257 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811fac Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811fac Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit da7a8df Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit a853223 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811fac Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811fac Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets

…function (EvolvingLMMs-Lab#35) * add fuyu * Merge commit '6d570ac1d98a03585c8119ccb362e13ab2172fed' * Squashed commit of the following: commit 09c64b7491cd19d4e6c4a6e1a38254eaa74d0032 Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 6d570ac Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit fbb7aa5 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit b8ba33c Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 7dd84f3 Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (EvolvingLMMs-Lab#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit a781057 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (EvolvingLMMs-Lab#33) * add fuyu * Merge commit '6d570ac1d98a03585c8119ccb362e13ab2172fed' * Squashed commit of the following: commit 09c64b7491cd19d4e6c4a6e1a38254eaa74d0032 Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 6d570ac Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit fbb7aa5 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit b8ba33c Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit 6d570ac Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit fbb7aa5 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit b8ba33c Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets

…function (EvolvingLMMs-Lab#35) * add fuyu * Merge commit 'd8a4f8ef094e37c987863da971cbc51637b92b43' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit d8a4f8e Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit a2b4a2a Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit 47a6675 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 7eefb7e Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (EvolvingLMMs-Lab#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit 81d7b9f Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (EvolvingLMMs-Lab#33) * add fuyu * Merge commit 'd8a4f8ef094e37c987863da971cbc51637b92b43' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit d8a4f8e Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit a2b4a2a Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit 47a6675 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit d8a4f8e Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit a2b4a2a Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit 47a6675 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets

…function (EvolvingLMMs-Lab#35) * add fuyu * Merge commit 'ebe4eb8dffcce06f7be393478d35d76de82a3836' * Squashed commit of the following: commit 72ce63c90098fa7a7364f7a1113ce4b3b23b981a Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit ebe4eb8 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d10 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d10 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0fd4558 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit f125889 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d10 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d10 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 6a4b81b Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (EvolvingLMMs-Lab#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit fab8704 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (EvolvingLMMs-Lab#33) * add fuyu * Merge commit 'ebe4eb8dffcce06f7be393478d35d76de82a3836' * Squashed commit of the following: commit 72ce63c90098fa7a7364f7a1113ce4b3b23b981a Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit ebe4eb8 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d10 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d10 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0fd4558 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit f125889 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d10 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d10 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit ebe4eb8 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (EvolvingLMMs-Lab#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d10 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d10 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0fd4558 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (EvolvingLMMs-Lab#30) * mmmu_test * black commit f125889 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (EvolvingLMMs-Lab#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d10 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d10 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets

* add fuyu * Merge commit '7b7f6368e8e04cddbd6e7f572f1099b7911cbe04' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 7b7f6368e8e04cddbd6e7f572f1099b7911cbe04 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'ad8d9da1fb40c446202bf9b0095b02262df2ffc8' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit dbba2fe6447b0dfd4bb89a368f62178f2b253006 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 6ee856b61bbb0156dd72d454430cd01a246b5e61 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 4a1183c563835c366ea54a28e1a5761a193b6704 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'ad8d9da1fb40c446202bf9b0095b02262df2ffc8' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit dbba2fe6447b0dfd4bb89a368f62178f2b253006 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 7664839d1765e09b06e6cf59c12cb895ef71c40e Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit 05487a4e1f1dd1ab20d087399a47502716929a9b Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit '7b7f6368e8e04cddbd6e7f572f1099b7911cbe04' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 7b7f6368e8e04cddbd6e7f572f1099b7911cbe04 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'ad8d9da1fb40c446202bf9b0095b02262df2ffc8' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit dbba2fe6447b0dfd4bb89a368f62178f2b253006 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 6ee856b61bbb0156dd72d454430cd01a246b5e61 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 4a1183c563835c366ea54a28e1a5761a193b6704 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'ad8d9da1fb40c446202bf9b0095b02262df2ffc8' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit dbba2fe6447b0dfd4bb89a368f62178f2b253006 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit 7b7f6368e8e04cddbd6e7f572f1099b7911cbe04 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'ad8d9da1fb40c446202bf9b0095b02262df2ffc8' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit dbba2fe6447b0dfd4bb89a368f62178f2b253006 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 6ee856b61bbb0156dd72d454430cd01a246b5e61 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 4a1183c563835c366ea54a28e1a5761a193b6704 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'ad8d9da1fb40c446202bf9b0095b02262df2ffc8' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit dbba2fe6447b0dfd4bb89a368f62178f2b253006 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets * Refactor directory structure for RefCOCO+ and RefCOCOg datasets * Fix error logging in get_eval and parse_score functions * Update .gitignore and mme.yaml * Squashed commit of the following: commit 380494bb2417fae1bcc1535ad8b67df7af667619 Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:43:28 2024 +0800 black commit e46b937aeeed45f5dd574b852459bfb416d165fd Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:42:03 2024 +0800 adapt qwen to sqa, gqa, ai2d, docvqa commit 1f8780df5e89ee50f349361bb5ea7351a73e0c19 Author: Li Bo <[email protected]> Date: Thu Feb 1 16:20:27 2024 +0800 [Dataset] fix hallusion benchmark, add saving logic inside aggregate function (#35) * add fuyu * Merge commit '7b7f6368e8e04cddbd6e7f572f1099b7911cbe04' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 7b7f6368e8e04cddbd6e7f572f1099b7911cbe04 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'ad8d9da1fb40c446202bf9b0095b02262df2ffc8' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit dbba2fe6447b0dfd4bb89a368f62178f2b253006 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 6ee856b61bbb0156dd72d454430cd01a246b5e61 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 4a1183c563835c366ea54a28e1a5761a193b6704 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'ad8d9da1fb40c446202bf9b0095b02262df2ffc8' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit dbba2fe6447b0dfd4bb89a368f62178f2b253006 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 7664839d1765e09b06e6cf59c12cb895ef71c40e Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit 05487a4e1f1dd1ab20d087399a47502716929a9b Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit '7b7f6368e8e04cddbd6e7f572f1099b7911cbe04' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 7b7f6368e8e04cddbd6e7f572f1099b7911cbe04 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'ad8d9da1fb40c446202bf9b0095b02262df2ffc8' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit dbba2fe6447b0dfd4bb89a368f62178f2b253006 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 6ee856b61bbb0156dd72d454430cd01a246b5e61 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 4a1183c563835c366ea54a28e1a5761a193b6704 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'ad8d9da1fb40c446202bf9b0095b02262df2ffc8' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit dbba2fe6447b0dfd4bb89a368f62178f2b253006 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- …

* add fuyu * Merge commit '1c9c7f95a6b03950c05f47216c7dbf4c4d3edd29' * Squashed commit of the following: commit 09c64b7491cd19d4e6c4a6e1a38254eaa74d0032 Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 1c9c7f95a6b03950c05f47216c7dbf4c4d3edd29 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0dc9a47afe9a61214f11053dae5641716052f30f' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 9d06741f31439e6ac34764612664467239b63253 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 0d4e69f54d996672ab0471531837004f80ba9b10 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0dc9a47afe9a61214f11053dae5641716052f30f' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 2b01738ba36ee632712135d38f45ea40f1c1323a Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit 2f61ad5c3da7411eccda597afadcb64d573c5193 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit '1c9c7f95a6b03950c05f47216c7dbf4c4d3edd29' * Squashed commit of the following: commit 09c64b7491cd19d4e6c4a6e1a38254eaa74d0032 Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 1c9c7f95a6b03950c05f47216c7dbf4c4d3edd29 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0dc9a47afe9a61214f11053dae5641716052f30f' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 9d06741f31439e6ac34764612664467239b63253 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 0d4e69f54d996672ab0471531837004f80ba9b10 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0dc9a47afe9a61214f11053dae5641716052f30f' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit 1c9c7f95a6b03950c05f47216c7dbf4c4d3edd29 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0dc9a47afe9a61214f11053dae5641716052f30f' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 9d06741f31439e6ac34764612664467239b63253 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 0d4e69f54d996672ab0471531837004f80ba9b10 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0dc9a47afe9a61214f11053dae5641716052f30f' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets * Refactor directory structure for RefCOCO+ and RefCOCOg datasets * Fix error logging in get_eval and parse_score functions * Update .gitignore and mme.yaml * Squashed commit of the following: commit 04a4076120c4d337d70992b82bf2b4fa4c700359 Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:43:28 2024 +0800 black commit b3c423a93d944a2621c1fa4192616af048e5b77c Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:42:03 2024 +0800 adapt qwen to sqa, gqa, ai2d, docvqa commit 1c5354e09283b03f1c0068d39b82f8bfa73d4184 Author: Li Bo <[email protected]> Date: Thu Feb 1 16:20:27 2024 +0800 [Dataset] fix hallusion benchmark, add saving logic inside aggregate function (#35) * add fuyu * Merge commit '1c9c7f95a6b03950c05f47216c7dbf4c4d3edd29' * Squashed commit of the following: commit 09c64b7491cd19d4e6c4a6e1a38254eaa74d0032 Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 1c9c7f95a6b03950c05f47216c7dbf4c4d3edd29 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0dc9a47afe9a61214f11053dae5641716052f30f' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 9d06741f31439e6ac34764612664467239b63253 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 0d4e69f54d996672ab0471531837004f80ba9b10 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0dc9a47afe9a61214f11053dae5641716052f30f' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 2b01738ba36ee632712135d38f45ea40f1c1323a Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit 2f61ad5c3da7411eccda597afadcb64d573c5193 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit '1c9c7f95a6b03950c05f47216c7dbf4c4d3edd29' * Squashed commit of the following: commit 09c64b7491cd19d4e6c4a6e1a38254eaa74d0032 Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 1c9c7f95a6b03950c05f47216c7dbf4c4d3edd29 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0dc9a47afe9a61214f11053dae5641716052f30f' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 9d06741f31439e6ac34764612664467239b63253 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 0d4e69f54d996672ab0471531837004f80ba9b10 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0dc9a47afe9a61214f11053dae5641716052f30f' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- …

* add fuyu * Merge commit '708de71d7c634c51ade4443f7a8590dca74561ed' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 708de71d7c634c51ade4443f7a8590dca74561ed Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit e19ec39d72c2781f1f2d174094d3acfb4ada7861 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 1c5dbd5c7f65394a6395db59e97d148576a3ad20 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit af73a51ca7940095310f725544bd3473b67b412c Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit accfaffdc9ba3002757d1ee167063c7aa6a12394 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit '708de71d7c634c51ade4443f7a8590dca74561ed' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 708de71d7c634c51ade4443f7a8590dca74561ed Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit e19ec39d72c2781f1f2d174094d3acfb4ada7861 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 1c5dbd5c7f65394a6395db59e97d148576a3ad20 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit 708de71d7c634c51ade4443f7a8590dca74561ed Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit e19ec39d72c2781f1f2d174094d3acfb4ada7861 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 1c5dbd5c7f65394a6395db59e97d148576a3ad20 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets * Refactor directory structure for RefCOCO+ and RefCOCOg datasets * Fix error logging in get_eval and parse_score functions * Update .gitignore and mme.yaml * Squashed commit of the following: commit 380494bb2417fae1bcc1535ad8b67df7af667619 Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:43:28 2024 +0800 black commit e46b937aeeed45f5dd574b852459bfb416d165fd Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:42:03 2024 +0800 adapt qwen to sqa, gqa, ai2d, docvqa commit f6f7adae7485defcca27deafb2b19b37733233c6 Author: Li Bo <[email protected]> Date: Thu Feb 1 16:20:27 2024 +0800 [Dataset] fix hallusion benchmark, add saving logic inside aggregate function (#35) * add fuyu * Merge commit '708de71d7c634c51ade4443f7a8590dca74561ed' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 708de71d7c634c51ade4443f7a8590dca74561ed Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit e19ec39d72c2781f1f2d174094d3acfb4ada7861 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 1c5dbd5c7f65394a6395db59e97d148576a3ad20 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit af73a51ca7940095310f725544bd3473b67b412c Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit accfaffdc9ba3002757d1ee167063c7aa6a12394 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit '708de71d7c634c51ade4443f7a8590dca74561ed' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 708de71d7c634c51ade4443f7a8590dca74561ed Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit e19ec39d72c2781f1f2d174094d3acfb4ada7861 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 1c5dbd5c7f65394a6395db59e97d148576a3ad20 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- …

* add fuyu * Merge commit 'c2050a435b47dfba638b6ba6a1600515a9f61b4c' * Squashed commit of the following: commit 55411a8236a6a4af45c9d3d73349d9308f1b11dd Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit c2050a435b47dfba638b6ba6a1600515a9f61b4c Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 27ab7369c986607ad08e356e3bd951864c845e22 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 1c11ae4aeecd3305e99f3baaa54d2c5914d6a6b7 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit c37504a11db9763a0cb65e1cfc9081d8e60aa0fc Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit cb7b75e6f96a9b933557c570bea72a12b7800014 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit 'c2050a435b47dfba638b6ba6a1600515a9f61b4c' * Squashed commit of the following: commit 55411a8236a6a4af45c9d3d73349d9308f1b11dd Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit c2050a435b47dfba638b6ba6a1600515a9f61b4c Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 27ab7369c986607ad08e356e3bd951864c845e22 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 1c11ae4aeecd3305e99f3baaa54d2c5914d6a6b7 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit c2050a435b47dfba638b6ba6a1600515a9f61b4c Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 27ab7369c986607ad08e356e3bd951864c845e22 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 1c11ae4aeecd3305e99f3baaa54d2c5914d6a6b7 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets * Refactor directory structure for RefCOCO+ and RefCOCOg datasets * Fix error logging in get_eval and parse_score functions * Update .gitignore and mme.yaml * Squashed commit of the following: commit d887d8a25654322aa62cff6e94b39c262ebc8ae0 Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:43:28 2024 +0800 black commit 96b17d51b831b62da66685444f97188e1af9ad7a Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:42:03 2024 +0800 adapt qwen to sqa, gqa, ai2d, docvqa commit b94afc7866a362feb80b7e9a757a6cf2dbd78aa8 Author: Li Bo <[email protected]> Date: Thu Feb 1 16:20:27 2024 +0800 [Dataset] fix hallusion benchmark, add saving logic inside aggregate function (#35) * add fuyu * Merge commit 'c2050a435b47dfba638b6ba6a1600515a9f61b4c' * Squashed commit of the following: commit 55411a8236a6a4af45c9d3d73349d9308f1b11dd Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit c2050a435b47dfba638b6ba6a1600515a9f61b4c Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 27ab7369c986607ad08e356e3bd951864c845e22 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 1c11ae4aeecd3305e99f3baaa54d2c5914d6a6b7 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit c37504a11db9763a0cb65e1cfc9081d8e60aa0fc Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit cb7b75e6f96a9b933557c570bea72a12b7800014 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit 'c2050a435b47dfba638b6ba6a1600515a9f61b4c' * Squashed commit of the following: commit 55411a8236a6a4af45c9d3d73349d9308f1b11dd Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit c2050a435b47dfba638b6ba6a1600515a9f61b4c Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 27ab7369c986607ad08e356e3bd951864c845e22 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 1c11ae4aeecd3305e99f3baaa54d2c5914d6a6b7 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- …

* add fuyu * Merge commit '49e625761a6853595641a0a411c96168490dabad' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 49e625761a6853595641a0a411c96168490dabad Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit da7a8df0ec859a7e69bf0ace845f00ff3717ac75 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit a853223fa8da0ec1d59040768c896c1526b10dff Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 6e7cd871ca881e5002bbaa3dd7774d34fce12811 Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit efd3510236c5ca6948d65a7150fd7a5925902f3d Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit '49e625761a6853595641a0a411c96168490dabad' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 49e625761a6853595641a0a411c96168490dabad Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit da7a8df0ec859a7e69bf0ace845f00ff3717ac75 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit a853223fa8da0ec1d59040768c896c1526b10dff Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit 49e625761a6853595641a0a411c96168490dabad Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit da7a8df0ec859a7e69bf0ace845f00ff3717ac75 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit a853223fa8da0ec1d59040768c896c1526b10dff Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets * Refactor directory structure for RefCOCO+ and RefCOCOg datasets * Fix error logging in get_eval and parse_score functions * Update .gitignore and mme.yaml * Squashed commit of the following: commit 380494bb2417fae1bcc1535ad8b67df7af667619 Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:43:28 2024 +0800 black commit e46b937aeeed45f5dd574b852459bfb416d165fd Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:42:03 2024 +0800 adapt qwen to sqa, gqa, ai2d, docvqa commit 7037fd2991af7afe522d9492878cde4b2699bc43 Author: Li Bo <[email protected]> Date: Thu Feb 1 16:20:27 2024 +0800 [Dataset] fix hallusion benchmark, add saving logic inside aggregate function (#35) * add fuyu * Merge commit '49e625761a6853595641a0a411c96168490dabad' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 49e625761a6853595641a0a411c96168490dabad Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit da7a8df0ec859a7e69bf0ace845f00ff3717ac75 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit a853223fa8da0ec1d59040768c896c1526b10dff Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 6e7cd871ca881e5002bbaa3dd7774d34fce12811 Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit efd3510236c5ca6948d65a7150fd7a5925902f3d Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit '49e625761a6853595641a0a411c96168490dabad' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 49e625761a6853595641a0a411c96168490dabad Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit da7a8df0ec859a7e69bf0ace845f00ff3717ac75 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit a853223fa8da0ec1d59040768c896c1526b10dff Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- …

* add fuyu * Merge commit '6d570ac1d98a03585c8119ccb362e13ab2172fed' * Squashed commit of the following: commit 09c64b7491cd19d4e6c4a6e1a38254eaa74d0032 Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 6d570ac1d98a03585c8119ccb362e13ab2172fed Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit fbb7aa57856f800d6c18413318830f4bbc6c8157 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit b8ba33c2a349cb5b479e14af1a2d30f15ad53010 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 7dd84f337cf1ce906dfeb92118e6c2998707a79a Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit a781057ad07b0a60c7ef682f864be598b2436b7c Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit '6d570ac1d98a03585c8119ccb362e13ab2172fed' * Squashed commit of the following: commit 09c64b7491cd19d4e6c4a6e1a38254eaa74d0032 Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 6d570ac1d98a03585c8119ccb362e13ab2172fed Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit fbb7aa57856f800d6c18413318830f4bbc6c8157 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit b8ba33c2a349cb5b479e14af1a2d30f15ad53010 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit 6d570ac1d98a03585c8119ccb362e13ab2172fed Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit fbb7aa57856f800d6c18413318830f4bbc6c8157 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit b8ba33c2a349cb5b479e14af1a2d30f15ad53010 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets * Refactor directory structure for RefCOCO+ and RefCOCOg datasets * Fix error logging in get_eval and parse_score functions * Update .gitignore and mme.yaml * Squashed commit of the following: commit 04a4076120c4d337d70992b82bf2b4fa4c700359 Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:43:28 2024 +0800 black commit b3c423a93d944a2621c1fa4192616af048e5b77c Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:42:03 2024 +0800 adapt qwen to sqa, gqa, ai2d, docvqa commit c3b0da62994f646141456b60baaa3ee5713f38fa Author: Li Bo <[email protected]> Date: Thu Feb 1 16:20:27 2024 +0800 [Dataset] fix hallusion benchmark, add saving logic inside aggregate function (#35) * add fuyu * Merge commit '6d570ac1d98a03585c8119ccb362e13ab2172fed' * Squashed commit of the following: commit 09c64b7491cd19d4e6c4a6e1a38254eaa74d0032 Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 6d570ac1d98a03585c8119ccb362e13ab2172fed Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit fbb7aa57856f800d6c18413318830f4bbc6c8157 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit b8ba33c2a349cb5b479e14af1a2d30f15ad53010 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 7dd84f337cf1ce906dfeb92118e6c2998707a79a Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit a781057ad07b0a60c7ef682f864be598b2436b7c Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit '6d570ac1d98a03585c8119ccb362e13ab2172fed' * Squashed commit of the following: commit 09c64b7491cd19d4e6c4a6e1a38254eaa74d0032 Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 6d570ac1d98a03585c8119ccb362e13ab2172fed Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit fbb7aa57856f800d6c18413318830f4bbc6c8157 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit b8ba33c2a349cb5b479e14af1a2d30f15ad53010 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- …

* add fuyu * Merge commit 'd8a4f8ef094e37c987863da971cbc51637b92b43' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit d8a4f8ef094e37c987863da971cbc51637b92b43 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit a2b4a2a27d6f6f712e5214bb3bb55c0a679b9499 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 47a6675ce97fc0e0732c195258e6c29f3b3ff275 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 7eefb7e3bb827b0e784ed0395e4125c535b6eeef Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit 81d7b9fdf3e662405e0ea358900a4c6981cc502f Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit 'd8a4f8ef094e37c987863da971cbc51637b92b43' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit d8a4f8ef094e37c987863da971cbc51637b92b43 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit a2b4a2a27d6f6f712e5214bb3bb55c0a679b9499 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 47a6675ce97fc0e0732c195258e6c29f3b3ff275 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit d8a4f8ef094e37c987863da971cbc51637b92b43 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit a2b4a2a27d6f6f712e5214bb3bb55c0a679b9499 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 47a6675ce97fc0e0732c195258e6c29f3b3ff275 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets * Refactor directory structure for RefCOCO+ and RefCOCOg datasets * Fix error logging in get_eval and parse_score functions * Update .gitignore and mme.yaml * Squashed commit of the following: commit 380494bb2417fae1bcc1535ad8b67df7af667619 Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:43:28 2024 +0800 black commit e46b937aeeed45f5dd574b852459bfb416d165fd Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:42:03 2024 +0800 adapt qwen to sqa, gqa, ai2d, docvqa commit ae76855543ee127e79809843378a18aa06d90261 Author: Li Bo <[email protected]> Date: Thu Feb 1 16:20:27 2024 +0800 [Dataset] fix hallusion benchmark, add saving logic inside aggregate function (#35) * add fuyu * Merge commit 'd8a4f8ef094e37c987863da971cbc51637b92b43' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit d8a4f8ef094e37c987863da971cbc51637b92b43 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit a2b4a2a27d6f6f712e5214bb3bb55c0a679b9499 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 47a6675ce97fc0e0732c195258e6c29f3b3ff275 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 7eefb7e3bb827b0e784ed0395e4125c535b6eeef Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit 81d7b9fdf3e662405e0ea358900a4c6981cc502f Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit 'd8a4f8ef094e37c987863da971cbc51637b92b43' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit d8a4f8ef094e37c987863da971cbc51637b92b43 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit a2b4a2a27d6f6f712e5214bb3bb55c0a679b9499 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 47a6675ce97fc0e0732c195258e6c29f3b3ff275 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- …

* add fuyu * Merge commit 'ebe4eb8dffcce06f7be393478d35d76de82a3836' * Squashed commit of the following: commit 72ce63c90098fa7a7364f7a1113ce4b3b23b981a Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit ebe4eb8dffcce06f7be393478d35d76de82a3836 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0fd45585aecf41e04bb6510cf09c0b829bd0f49d Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit f1258892713f588f8d65826f9141e38048f5ff31 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 6a4b81baa42b29457cbaea42043723c2332ad5ba Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit fab87047e683d9982ea0f544feb3e2fce4e1fbf4 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit 'ebe4eb8dffcce06f7be393478d35d76de82a3836' * Squashed commit of the following: commit 72ce63c90098fa7a7364f7a1113ce4b3b23b981a Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit ebe4eb8dffcce06f7be393478d35d76de82a3836 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0fd45585aecf41e04bb6510cf09c0b829bd0f49d Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit f1258892713f588f8d65826f9141e38048f5ff31 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit ebe4eb8dffcce06f7be393478d35d76de82a3836 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0fd45585aecf41e04bb6510cf09c0b829bd0f49d Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit f1258892713f588f8d65826f9141e38048f5ff31 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets * Refactor directory structure for RefCOCO+ and RefCOCOg datasets * Fix error logging in get_eval and parse_score functions * Update .gitignore and mme.yaml * Squashed commit of the following: commit 6e6fe00bf9d5fcfd351c164285c569e53f38e280 Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:43:28 2024 +0800 black commit 938c7729a9176e459531cbd00bb6f8d69691258b Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:42:03 2024 +0800 adapt qwen to sqa, gqa, ai2d, docvqa commit 2412a0072cc8840593c90e5bdeff64aa8f375bdc Author: Li Bo <[email protected]> Date: Thu Feb 1 16:20:27 2024 +0800 [Dataset] fix hallusion benchmark, add saving logic inside aggregate function (#35) * add fuyu * Merge commit 'ebe4eb8dffcce06f7be393478d35d76de82a3836' * Squashed commit of the following: commit 72ce63c90098fa7a7364f7a1113ce4b3b23b981a Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit ebe4eb8dffcce06f7be393478d35d76de82a3836 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0fd45585aecf41e04bb6510cf09c0b829bd0f49d Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit f1258892713f588f8d65826f9141e38048f5ff31 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 6a4b81baa42b29457cbaea42043723c2332ad5ba Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit fab87047e683d9982ea0f544feb3e2fce4e1fbf4 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit 'ebe4eb8dffcce06f7be393478d35d76de82a3836' * Squashed commit of the following: commit 72ce63c90098fa7a7364f7a1113ce4b3b23b981a Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit ebe4eb8dffcce06f7be393478d35d76de82a3836 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0fd45585aecf41e04bb6510cf09c0b829bd0f49d Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit f1258892713f588f8d65826f9141e38048f5ff31 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- …

* add fuyu * Merge commit '7b7f6368e8e04cddbd6e7f572f1099b7911cbe04' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 7b7f6368e8e04cddbd6e7f572f1099b7911cbe04 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'ad8d9da1fb40c446202bf9b0095b02262df2ffc8' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit dbba2fe6447b0dfd4bb89a368f62178f2b253006 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 6ee856b61bbb0156dd72d454430cd01a246b5e61 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 4a1183c563835c366ea54a28e1a5761a193b6704 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'ad8d9da1fb40c446202bf9b0095b02262df2ffc8' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit dbba2fe6447b0dfd4bb89a368f62178f2b253006 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 7664839d1765e09b06e6cf59c12cb895ef71c40e Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit 05487a4e1f1dd1ab20d087399a47502716929a9b Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit '7b7f6368e8e04cddbd6e7f572f1099b7911cbe04' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 7b7f6368e8e04cddbd6e7f572f1099b7911cbe04 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'ad8d9da1fb40c446202bf9b0095b02262df2ffc8' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit dbba2fe6447b0dfd4bb89a368f62178f2b253006 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 6ee856b61bbb0156dd72d454430cd01a246b5e61 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 4a1183c563835c366ea54a28e1a5761a193b6704 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'ad8d9da1fb40c446202bf9b0095b02262df2ffc8' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit dbba2fe6447b0dfd4bb89a368f62178f2b253006 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit 7b7f6368e8e04cddbd6e7f572f1099b7911cbe04 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'ad8d9da1fb40c446202bf9b0095b02262df2ffc8' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit dbba2fe6447b0dfd4bb89a368f62178f2b253006 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 6ee856b61bbb0156dd72d454430cd01a246b5e61 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 4a1183c563835c366ea54a28e1a5761a193b6704 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'ad8d9da1fb40c446202bf9b0095b02262df2ffc8' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit dbba2fe6447b0dfd4bb89a368f62178f2b253006 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets * Refactor directory structure for RefCOCO+ and RefCOCOg datasets * Fix error logging in get_eval and parse_score functions * Update .gitignore and mme.yaml * Squashed commit of the following: commit 380494bb2417fae1bcc1535ad8b67df7af667619 Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:43:28 2024 +0800 black commit e46b937aeeed45f5dd574b852459bfb416d165fd Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:42:03 2024 +0800 adapt qwen to sqa, gqa, ai2d, docvqa commit 1f8780df5e89ee50f349361bb5ea7351a73e0c19 Author: Li Bo <[email protected]> Date: Thu Feb 1 16:20:27 2024 +0800 [Dataset] fix hallusion benchmark, add saving logic inside aggregate function (#35) * add fuyu * Merge commit '7b7f6368e8e04cddbd6e7f572f1099b7911cbe04' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 7b7f6368e8e04cddbd6e7f572f1099b7911cbe04 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'ad8d9da1fb40c446202bf9b0095b02262df2ffc8' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit dbba2fe6447b0dfd4bb89a368f62178f2b253006 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 6ee856b61bbb0156dd72d454430cd01a246b5e61 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 4a1183c563835c366ea54a28e1a5761a193b6704 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'ad8d9da1fb40c446202bf9b0095b02262df2ffc8' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit dbba2fe6447b0dfd4bb89a368f62178f2b253006 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 7664839d1765e09b06e6cf59c12cb895ef71c40e Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit 05487a4e1f1dd1ab20d087399a47502716929a9b Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit '7b7f6368e8e04cddbd6e7f572f1099b7911cbe04' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 7b7f6368e8e04cddbd6e7f572f1099b7911cbe04 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'ad8d9da1fb40c446202bf9b0095b02262df2ffc8' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit dbba2fe6447b0dfd4bb89a368f62178f2b253006 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 6ee856b61bbb0156dd72d454430cd01a246b5e61 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 4a1183c563835c366ea54a28e1a5761a193b6704 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'ad8d9da1fb40c446202bf9b0095b02262df2ffc8' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit dbba2fe6447b0dfd4bb89a368f62178f2b253006 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit c09b621195878300417315a97efdec25e67dd7f5 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 864a1aba26388276b7e57717b89520fcc77b3f62 Merge: ab898e4 ad8d9da Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit ab898e4fd30bf83888125d48b80bc86b01cb5d39 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit c0ea54d49cb65b747d7e8fccac75838acabe05db Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit ad8d9da1fb40c446202bf9b0095b02262df2ffc8 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co…

* add fuyu * Merge commit '1c9c7f95a6b03950c05f47216c7dbf4c4d3edd29' * Squashed commit of the following: commit 09c64b7491cd19d4e6c4a6e1a38254eaa74d0032 Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 1c9c7f95a6b03950c05f47216c7dbf4c4d3edd29 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0dc9a47afe9a61214f11053dae5641716052f30f' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 9d06741f31439e6ac34764612664467239b63253 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 0d4e69f54d996672ab0471531837004f80ba9b10 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0dc9a47afe9a61214f11053dae5641716052f30f' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 2b01738ba36ee632712135d38f45ea40f1c1323a Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit 2f61ad5c3da7411eccda597afadcb64d573c5193 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit '1c9c7f95a6b03950c05f47216c7dbf4c4d3edd29' * Squashed commit of the following: commit 09c64b7491cd19d4e6c4a6e1a38254eaa74d0032 Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 1c9c7f95a6b03950c05f47216c7dbf4c4d3edd29 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0dc9a47afe9a61214f11053dae5641716052f30f' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 9d06741f31439e6ac34764612664467239b63253 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 0d4e69f54d996672ab0471531837004f80ba9b10 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0dc9a47afe9a61214f11053dae5641716052f30f' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit 1c9c7f95a6b03950c05f47216c7dbf4c4d3edd29 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0dc9a47afe9a61214f11053dae5641716052f30f' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 9d06741f31439e6ac34764612664467239b63253 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 0d4e69f54d996672ab0471531837004f80ba9b10 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0dc9a47afe9a61214f11053dae5641716052f30f' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets * Refactor directory structure for RefCOCO+ and RefCOCOg datasets * Fix error logging in get_eval and parse_score functions * Update .gitignore and mme.yaml * Squashed commit of the following: commit 04a4076120c4d337d70992b82bf2b4fa4c700359 Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:43:28 2024 +0800 black commit b3c423a93d944a2621c1fa4192616af048e5b77c Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:42:03 2024 +0800 adapt qwen to sqa, gqa, ai2d, docvqa commit 1c5354e09283b03f1c0068d39b82f8bfa73d4184 Author: Li Bo <[email protected]> Date: Thu Feb 1 16:20:27 2024 +0800 [Dataset] fix hallusion benchmark, add saving logic inside aggregate function (#35) * add fuyu * Merge commit '1c9c7f95a6b03950c05f47216c7dbf4c4d3edd29' * Squashed commit of the following: commit 09c64b7491cd19d4e6c4a6e1a38254eaa74d0032 Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 1c9c7f95a6b03950c05f47216c7dbf4c4d3edd29 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0dc9a47afe9a61214f11053dae5641716052f30f' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 9d06741f31439e6ac34764612664467239b63253 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 0d4e69f54d996672ab0471531837004f80ba9b10 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0dc9a47afe9a61214f11053dae5641716052f30f' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 2b01738ba36ee632712135d38f45ea40f1c1323a Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit 2f61ad5c3da7411eccda597afadcb64d573c5193 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit '1c9c7f95a6b03950c05f47216c7dbf4c4d3edd29' * Squashed commit of the following: commit 09c64b7491cd19d4e6c4a6e1a38254eaa74d0032 Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 1c9c7f95a6b03950c05f47216c7dbf4c4d3edd29 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0dc9a47afe9a61214f11053dae5641716052f30f' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 9d06741f31439e6ac34764612664467239b63253 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 0d4e69f54d996672ab0471531837004f80ba9b10 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0dc9a47afe9a61214f11053dae5641716052f30f' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0dc9a47afe9a61214f11053dae5641716052f30f Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co…

* add fuyu * Merge commit '708de71d7c634c51ade4443f7a8590dca74561ed' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 708de71d7c634c51ade4443f7a8590dca74561ed Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit e19ec39d72c2781f1f2d174094d3acfb4ada7861 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 1c5dbd5c7f65394a6395db59e97d148576a3ad20 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit af73a51ca7940095310f725544bd3473b67b412c Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit accfaffdc9ba3002757d1ee167063c7aa6a12394 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit '708de71d7c634c51ade4443f7a8590dca74561ed' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 708de71d7c634c51ade4443f7a8590dca74561ed Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit e19ec39d72c2781f1f2d174094d3acfb4ada7861 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 1c5dbd5c7f65394a6395db59e97d148576a3ad20 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit 708de71d7c634c51ade4443f7a8590dca74561ed Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit e19ec39d72c2781f1f2d174094d3acfb4ada7861 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 1c5dbd5c7f65394a6395db59e97d148576a3ad20 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets * Refactor directory structure for RefCOCO+ and RefCOCOg datasets * Fix error logging in get_eval and parse_score functions * Update .gitignore and mme.yaml * Squashed commit of the following: commit 380494bb2417fae1bcc1535ad8b67df7af667619 Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:43:28 2024 +0800 black commit e46b937aeeed45f5dd574b852459bfb416d165fd Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:42:03 2024 +0800 adapt qwen to sqa, gqa, ai2d, docvqa commit f6f7adae7485defcca27deafb2b19b37733233c6 Author: Li Bo <[email protected]> Date: Thu Feb 1 16:20:27 2024 +0800 [Dataset] fix hallusion benchmark, add saving logic inside aggregate function (#35) * add fuyu * Merge commit '708de71d7c634c51ade4443f7a8590dca74561ed' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 708de71d7c634c51ade4443f7a8590dca74561ed Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit e19ec39d72c2781f1f2d174094d3acfb4ada7861 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 1c5dbd5c7f65394a6395db59e97d148576a3ad20 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit af73a51ca7940095310f725544bd3473b67b412c Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit accfaffdc9ba3002757d1ee167063c7aa6a12394 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit '708de71d7c634c51ade4443f7a8590dca74561ed' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 708de71d7c634c51ade4443f7a8590dca74561ed Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit e19ec39d72c2781f1f2d174094d3acfb4ada7861 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 1c5dbd5c7f65394a6395db59e97d148576a3ad20 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5fb3e5d50de23f7f9f7bb10510e21ffb22c02adb Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co…

* add fuyu * Merge commit 'c2050a435b47dfba638b6ba6a1600515a9f61b4c' * Squashed commit of the following: commit 55411a8236a6a4af45c9d3d73349d9308f1b11dd Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit c2050a435b47dfba638b6ba6a1600515a9f61b4c Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 27ab7369c986607ad08e356e3bd951864c845e22 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 1c11ae4aeecd3305e99f3baaa54d2c5914d6a6b7 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit c37504a11db9763a0cb65e1cfc9081d8e60aa0fc Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit cb7b75e6f96a9b933557c570bea72a12b7800014 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit 'c2050a435b47dfba638b6ba6a1600515a9f61b4c' * Squashed commit of the following: commit 55411a8236a6a4af45c9d3d73349d9308f1b11dd Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit c2050a435b47dfba638b6ba6a1600515a9f61b4c Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 27ab7369c986607ad08e356e3bd951864c845e22 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 1c11ae4aeecd3305e99f3baaa54d2c5914d6a6b7 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit c2050a435b47dfba638b6ba6a1600515a9f61b4c Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 27ab7369c986607ad08e356e3bd951864c845e22 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 1c11ae4aeecd3305e99f3baaa54d2c5914d6a6b7 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets * Refactor directory structure for RefCOCO+ and RefCOCOg datasets * Fix error logging in get_eval and parse_score functions * Update .gitignore and mme.yaml * Squashed commit of the following: commit d887d8a25654322aa62cff6e94b39c262ebc8ae0 Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:43:28 2024 +0800 black commit 96b17d51b831b62da66685444f97188e1af9ad7a Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:42:03 2024 +0800 adapt qwen to sqa, gqa, ai2d, docvqa commit b94afc7866a362feb80b7e9a757a6cf2dbd78aa8 Author: Li Bo <[email protected]> Date: Thu Feb 1 16:20:27 2024 +0800 [Dataset] fix hallusion benchmark, add saving logic inside aggregate function (#35) * add fuyu * Merge commit 'c2050a435b47dfba638b6ba6a1600515a9f61b4c' * Squashed commit of the following: commit 55411a8236a6a4af45c9d3d73349d9308f1b11dd Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit c2050a435b47dfba638b6ba6a1600515a9f61b4c Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 27ab7369c986607ad08e356e3bd951864c845e22 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 1c11ae4aeecd3305e99f3baaa54d2c5914d6a6b7 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit c37504a11db9763a0cb65e1cfc9081d8e60aa0fc Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit cb7b75e6f96a9b933557c570bea72a12b7800014 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit 'c2050a435b47dfba638b6ba6a1600515a9f61b4c' * Squashed commit of the following: commit 55411a8236a6a4af45c9d3d73349d9308f1b11dd Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit c2050a435b47dfba638b6ba6a1600515a9f61b4c Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 27ab7369c986607ad08e356e3bd951864c845e22 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 1c11ae4aeecd3305e99f3baaa54d2c5914d6a6b7 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '4b30564ccba6af8112cd9fedf36a16bb6571b1d9' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 4b30564ccba6af8112cd9fedf36a16bb6571b1d9 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co…

* add fuyu * Merge commit '49e625761a6853595641a0a411c96168490dabad' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 49e625761a6853595641a0a411c96168490dabad Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit da7a8df0ec859a7e69bf0ace845f00ff3717ac75 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit a853223fa8da0ec1d59040768c896c1526b10dff Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 6e7cd871ca881e5002bbaa3dd7774d34fce12811 Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit efd3510236c5ca6948d65a7150fd7a5925902f3d Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit '49e625761a6853595641a0a411c96168490dabad' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 49e625761a6853595641a0a411c96168490dabad Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit da7a8df0ec859a7e69bf0ace845f00ff3717ac75 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit a853223fa8da0ec1d59040768c896c1526b10dff Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit 49e625761a6853595641a0a411c96168490dabad Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit da7a8df0ec859a7e69bf0ace845f00ff3717ac75 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit a853223fa8da0ec1d59040768c896c1526b10dff Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets * Refactor directory structure for RefCOCO+ and RefCOCOg datasets * Fix error logging in get_eval and parse_score functions * Update .gitignore and mme.yaml * Squashed commit of the following: commit 380494bb2417fae1bcc1535ad8b67df7af667619 Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:43:28 2024 +0800 black commit e46b937aeeed45f5dd574b852459bfb416d165fd Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:42:03 2024 +0800 adapt qwen to sqa, gqa, ai2d, docvqa commit 7037fd2991af7afe522d9492878cde4b2699bc43 Author: Li Bo <[email protected]> Date: Thu Feb 1 16:20:27 2024 +0800 [Dataset] fix hallusion benchmark, add saving logic inside aggregate function (#35) * add fuyu * Merge commit '49e625761a6853595641a0a411c96168490dabad' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 49e625761a6853595641a0a411c96168490dabad Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit da7a8df0ec859a7e69bf0ace845f00ff3717ac75 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit a853223fa8da0ec1d59040768c896c1526b10dff Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 6e7cd871ca881e5002bbaa3dd7774d34fce12811 Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit efd3510236c5ca6948d65a7150fd7a5925902f3d Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit '49e625761a6853595641a0a411c96168490dabad' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 49e625761a6853595641a0a411c96168490dabad Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit da7a8df0ec859a7e69bf0ace845f00ff3717ac75 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit a853223fa8da0ec1d59040768c896c1526b10dff Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'e811faca3743a9b0c865144145198cc5eea21393' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 15f168756d8f92f53dea87548efe606d0d1401b5 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit 290c53c0ea60868d2f0fb31bee1ac8d213b08d36 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 27bc5c84f9d9f2ff56b2adfa69d23894f4027100 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit 09d42b879158738f5484f31d514c6b400a418551 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit e8110aacf87bb0450db298b0993164765e0a624f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit e811faca3743a9b0c865144145198cc5eea21393 Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co…

* add fuyu * Merge commit '6d570ac1d98a03585c8119ccb362e13ab2172fed' * Squashed commit of the following: commit 09c64b7491cd19d4e6c4a6e1a38254eaa74d0032 Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 6d570ac1d98a03585c8119ccb362e13ab2172fed Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit fbb7aa57856f800d6c18413318830f4bbc6c8157 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit b8ba33c2a349cb5b479e14af1a2d30f15ad53010 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 7dd84f337cf1ce906dfeb92118e6c2998707a79a Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit a781057ad07b0a60c7ef682f864be598b2436b7c Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit '6d570ac1d98a03585c8119ccb362e13ab2172fed' * Squashed commit of the following: commit 09c64b7491cd19d4e6c4a6e1a38254eaa74d0032 Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 6d570ac1d98a03585c8119ccb362e13ab2172fed Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit fbb7aa57856f800d6c18413318830f4bbc6c8157 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit b8ba33c2a349cb5b479e14af1a2d30f15ad53010 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit 6d570ac1d98a03585c8119ccb362e13ab2172fed Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit fbb7aa57856f800d6c18413318830f4bbc6c8157 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit b8ba33c2a349cb5b479e14af1a2d30f15ad53010 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets * Refactor directory structure for RefCOCO+ and RefCOCOg datasets * Fix error logging in get_eval and parse_score functions * Update .gitignore and mme.yaml * Squashed commit of the following: commit 04a4076120c4d337d70992b82bf2b4fa4c700359 Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:43:28 2024 +0800 black commit b3c423a93d944a2621c1fa4192616af048e5b77c Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:42:03 2024 +0800 adapt qwen to sqa, gqa, ai2d, docvqa commit c3b0da62994f646141456b60baaa3ee5713f38fa Author: Li Bo <[email protected]> Date: Thu Feb 1 16:20:27 2024 +0800 [Dataset] fix hallusion benchmark, add saving logic inside aggregate function (#35) * add fuyu * Merge commit '6d570ac1d98a03585c8119ccb362e13ab2172fed' * Squashed commit of the following: commit 09c64b7491cd19d4e6c4a6e1a38254eaa74d0032 Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 6d570ac1d98a03585c8119ccb362e13ab2172fed Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit fbb7aa57856f800d6c18413318830f4bbc6c8157 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit b8ba33c2a349cb5b479e14af1a2d30f15ad53010 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 7dd84f337cf1ce906dfeb92118e6c2998707a79a Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit a781057ad07b0a60c7ef682f864be598b2436b7c Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit '6d570ac1d98a03585c8119ccb362e13ab2172fed' * Squashed commit of the following: commit 09c64b7491cd19d4e6c4a6e1a38254eaa74d0032 Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit 6d570ac1d98a03585c8119ccb362e13ab2172fed Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit fbb7aa57856f800d6c18413318830f4bbc6c8157 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit b8ba33c2a349cb5b479e14af1a2d30f15ad53010 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co…

* add fuyu * Merge commit 'd8a4f8ef094e37c987863da971cbc51637b92b43' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit d8a4f8ef094e37c987863da971cbc51637b92b43 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit a2b4a2a27d6f6f712e5214bb3bb55c0a679b9499 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 47a6675ce97fc0e0732c195258e6c29f3b3ff275 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 7eefb7e3bb827b0e784ed0395e4125c535b6eeef Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit 81d7b9fdf3e662405e0ea358900a4c6981cc502f Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit 'd8a4f8ef094e37c987863da971cbc51637b92b43' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit d8a4f8ef094e37c987863da971cbc51637b92b43 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit a2b4a2a27d6f6f712e5214bb3bb55c0a679b9499 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 47a6675ce97fc0e0732c195258e6c29f3b3ff275 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit d8a4f8ef094e37c987863da971cbc51637b92b43 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit a2b4a2a27d6f6f712e5214bb3bb55c0a679b9499 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 47a6675ce97fc0e0732c195258e6c29f3b3ff275 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets * Refactor directory structure for RefCOCO+ and RefCOCOg datasets * Fix error logging in get_eval and parse_score functions * Update .gitignore and mme.yaml * Squashed commit of the following: commit 380494bb2417fae1bcc1535ad8b67df7af667619 Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:43:28 2024 +0800 black commit e46b937aeeed45f5dd574b852459bfb416d165fd Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:42:03 2024 +0800 adapt qwen to sqa, gqa, ai2d, docvqa commit ae76855543ee127e79809843378a18aa06d90261 Author: Li Bo <[email protected]> Date: Thu Feb 1 16:20:27 2024 +0800 [Dataset] fix hallusion benchmark, add saving logic inside aggregate function (#35) * add fuyu * Merge commit 'd8a4f8ef094e37c987863da971cbc51637b92b43' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit d8a4f8ef094e37c987863da971cbc51637b92b43 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit a2b4a2a27d6f6f712e5214bb3bb55c0a679b9499 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 47a6675ce97fc0e0732c195258e6c29f3b3ff275 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 7eefb7e3bb827b0e784ed0395e4125c535b6eeef Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit 81d7b9fdf3e662405e0ea358900a4c6981cc502f Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit 'd8a4f8ef094e37c987863da971cbc51637b92b43' * Squashed commit of the following: commit 96d95b3cb3540cd17bcab31f1a85ad0d04a12f1e Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit d8a4f8ef094e37c987863da971cbc51637b92b43 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit a2b4a2a27d6f6f712e5214bb3bb55c0a679b9499 Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 47a6675ce97fc0e0732c195258e6c29f3b3ff275 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '89545d0517eb5891710f2d7191ca7b650723701e' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit 992be447a9fdf701fc910177653017e3978bf56d Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit baf78ea27df4dfe5d88bc2abca707e117a4f9661 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit e323545d9f3a5e0f2219618a4b024aea3ff6e353 Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit dbe09071a986c68e6b2b60cbde501da8d498535f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit 844a47e5d49c71e5297decdf7510d8a1a214f934 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 89545d0517eb5891710f2d7191ca7b650723701e Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co…

* add fuyu * Merge commit 'ebe4eb8dffcce06f7be393478d35d76de82a3836' * Squashed commit of the following: commit 72ce63c90098fa7a7364f7a1113ce4b3b23b981a Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit ebe4eb8dffcce06f7be393478d35d76de82a3836 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0fd45585aecf41e04bb6510cf09c0b829bd0f49d Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit f1258892713f588f8d65826f9141e38048f5ff31 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 6a4b81baa42b29457cbaea42043723c2332ad5ba Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit fab87047e683d9982ea0f544feb3e2fce4e1fbf4 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit 'ebe4eb8dffcce06f7be393478d35d76de82a3836' * Squashed commit of the following: commit 72ce63c90098fa7a7364f7a1113ce4b3b23b981a Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit ebe4eb8dffcce06f7be393478d35d76de82a3836 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0fd45585aecf41e04bb6510cf09c0b829bd0f49d Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit f1258892713f588f8d65826f9141e38048f5ff31 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file commit ebe4eb8dffcce06f7be393478d35d76de82a3836 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0fd45585aecf41e04bb6510cf09c0b829bd0f49d Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit f1258892713f588f8d65826f9141e38048f5ff31 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update API configuration and file paths * Refactor evaluate_by_chatgpt function in utils.py * Add hallusion_output_vd_model.json to .gitignore * Add timeout to API request * Refactor file path generation and remove unnecessary suffix in log samples output names * Refactor code and add output path handling * Update lmms-eval API and add new models and datasets * Refactor directory structure for RefCOCO+ and RefCOCOg datasets * Fix error logging in get_eval and parse_score functions * Update .gitignore and mme.yaml * Squashed commit of the following: commit 6e6fe00bf9d5fcfd351c164285c569e53f38e280 Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:43:28 2024 +0800 black commit 938c7729a9176e459531cbd00bb6f8d69691258b Author: jzhang38 <[email protected]> Date: Fri Feb 2 13:42:03 2024 +0800 adapt qwen to sqa, gqa, ai2d, docvqa commit 2412a0072cc8840593c90e5bdeff64aa8f375bdc Author: Li Bo <[email protected]> Date: Thu Feb 1 16:20:27 2024 +0800 [Dataset] fix hallusion benchmark, add saving logic inside aggregate function (#35) * add fuyu * Merge commit 'ebe4eb8dffcce06f7be393478d35d76de82a3836' * Squashed commit of the following: commit 72ce63c90098fa7a7364f7a1113ce4b3b23b981a Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit ebe4eb8dffcce06f7be393478d35d76de82a3836 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0fd45585aecf41e04bb6510cf09c0b829bd0f49d Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit f1258892713f588f8d65826f9141e38048f5ff31 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: kcz358 <[email protected]> * Update hb_doc_to_text function to remove unnecessary line break * Add Fuyu model and update OtterHD model * Refactor model response handling and fix image processing bug * Refactor flatten method to support only getting the first element * Add support for specifying timezone in datetime string Update flatten method in OtterHD class Update get_datetime_str function in utils.py * Fix condition for checking wandb_args_dict in __main__.py * Commented out assertions for batch size in Fuyu model * Add warning message for existing output file * Fix batch size issue in OtterHD model * Squashed commit of the following: commit 6a4b81baa42b29457cbaea42043723c2332ad5ba Author: Li Bo <[email protected]> Date: Wed Jan 31 16:00:22 2024 +0800 [Datasets] add hallubench (#34) * Add hallu bench * Fix hall_b gpt eval bugs --------- Co-authored-by: kcz358 <[email protected]> commit fab87047e683d9982ea0f544feb3e2fce4e1fbf4 Author: Li Bo <[email protected]> Date: Wed Jan 31 14:23:15 2024 +0800 [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33) * add fuyu * Merge commit 'ebe4eb8dffcce06f7be393478d35d76de82a3836' * Squashed commit of the following: commit 72ce63c90098fa7a7364f7a1113ce4b3b23b981a Author: kcz358 <[email protected]> Date: Tue Jan 30 19:39:57 2024 +0800 Add hallu bench commit ebe4eb8dffcce06f7be393478d35d76de82a3836 Author: Pu Fanyi <[email protected]> Date: Tue Jan 30 14:52:51 2024 +0800 scienceqa for full set (#32) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 0fd45585aecf41e04bb6510cf09c0b829bd0f49d Author: Zhang Peiyuan <[email protected]> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit f1258892713f588f8d65826f9141e38048f5ff31 Author: Li Bo <[email protected]> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <[email protected]> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <[email protected]> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <[email protected]> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <[email protected]> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <[email protected]> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 5553d106e5ffd84b280b3d5a3c8d47c35e2d310b Author: Zhang Peiyuan <[email protected]> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co…

add_ocrbench

add_ocrbench

6407c14

pufanyi added 3 commits March 25, 2024 20:25

lint

72b5898

save results to the file

cf94793

lint

e00d0ca

pufanyi requested review from pufanyi and Luodian March 25, 2024 13:07

Luodian approved these changes Mar 25, 2024

View reviewed changes

Luodian merged commit 9dfb53a into EvolvingLMMs-Lab:main Mar 25, 2024
1 check passed

pufanyi removed their request for review March 25, 2024 13:11

kcz358 mentioned this pull request Jun 7, 2024

How to contribute a new dataset? #99

Open

kangreen0210 pushed a commit to kangreen0210/LIME that referenced this pull request Oct 6, 2024

Merge pull request EvolvingLMMs-Lab#28 from echo840/main

0c47dca

add_ocrbench

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add_ocrbench #28

add_ocrbench #28

echo840 commented Mar 24, 2024 •

edited

Loading

Luodian commented Mar 24, 2024 •

edited

Loading

add_ocrbench #28

add_ocrbench #28

Conversation

echo840 commented Mar 24, 2024 • edited Loading

When you open a pull-request, please be sure to include the following

Luodian commented Mar 24, 2024 • edited Loading

echo840 commented Mar 24, 2024 •

edited

Loading

Luodian commented Mar 24, 2024 •

edited

Loading