scienceqa for full set (#32)

pufanyi · Luodian · kcz358 · web-flow · commit 0f183a394426 · 2024-01-30T14:52:51.000+08:00
* Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration * Update generation kwargs for LMMS tasks * Update lmms_eval MME task configuration and utils * Update generation_kwargs in lmms_eval tasks * Update doc_to_text function in coco and okvqa tasks * Add COCO 2017 version * Update task name in coco_test2017.yaml * Squashed commit of the following: commit 1e2ae93 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Mon Jan 29 22:41:33 2024 +0800 Add/mmmu test (#30) * mmmu_test * black commit 10bbaf0 Author: Li Bo <drluodian@gmail.com> Date: Sun Jan 28 22:19:13 2024 +0800 [Dataset Check] dataset check and add wandb logging (#29) * Remove unused code and configuration file * Remove docvqa.yaml and update vizwizvqa.yaml * lint * Add dataset_kwargs to vizwizvqa.yaml * Add dataset_kwargs to vizwizvqa.yaml * textvqa (#27) * Update textvqa.yaml and utils.py * Fix YAML formatting in textvqa.yaml and remove unused files * remove useless matric * add textvqa val & test * Update progress bar description in evaluator.py * Update submission file names in VizWizVQA tasks * Update output path to include log samples suffix * Update submission file paths in OKVQA and VizWizVQA tasks * Refactor llava-in-the-wild.yaml and utils.py * Update metric for llava evaluation * Refactor logging message in Task class * Merge commit '0a403e6f5e17c70a50983c83a132edf0fdcd98de' * Fix formatting issues and add progress bar closing statements * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml * Update tqdm progress bar in OtterHD model * Squashed commit of the following: commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * Fix error handling in loading YAML config files * Squashed commit of the following: commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Jan 28 12:41:40 2024 +0800 Fix key bugs commit eae210c3700a59b7d5cc9de46fcb855f443096aa Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Jan 28 09:46:19 2024 +0800 Black lint commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae Merge: ab898e4 fb209e4 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Jan 28 09:45:31 2024 +0800 Merge branch 'main' into kc/list_tasks_num commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Jan 28 09:44:23 2024 +0800 Enable list all tasks num commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Jan 28 09:41:32 2024 +0800 Exclude train yaml file in the task list commit 0a403e6 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sun Jan 28 02:04:57 2024 +0800 Add InfoVQA, DocVQA, and QwenVL (#28) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * add qwenvl * add infovqa and docvqa * List task #num sorted * Update prompt messages for image-related tasks * Delete unused task configuration files * Remove coco_train.yaml configuration file * Update task name in mmmu.yaml * Fix error message for missing tasks * Add wandb import and integration --------- Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg> Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> * Remove scienceqa_img task configuration * eval scienceqa with no images --------- Co-authored-by: Bo Li <drluodian@gmail.com> Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com>
diff --git a/lmms_eval/__main__.py b/lmms_eval/__main__.py
@@ -307,7 +307,6 @@ def print_results(args, results):
             else:
                 # use the name of the config file as run name
                 wandb_args_dict["name"] = all_args_dict["config"].split("/")[-1].split(".")[0]
-
         wandb_run = wandb.init(**wandb_args_dict)
         is_main_process = True
     else:
diff --git a/lmms_eval/tasks/scienceqa_img/scienceqa.yaml b/lmms_eval/tasks/scienceqa_img/scienceqa.yaml
@@ -1,5 +1,6 @@
-dataset_path: lmms-lab/ScienceQA-IMG
-task: "scienceqa_img"
+dataset_path: lmms-lab/ScienceQA
+dataset_name: ScienceQA-FULL
+task: "scienceqa"
 dataset_kwargs:
   token: True
 test_split: test
diff --git a/lmms_eval/tasks/scienceqa_img/scienceqa_img.yaml b/lmms_eval/tasks/scienceqa_img/scienceqa_img.yaml
@@ -0,0 +1,32 @@
+dataset_path: lmms-lab/ScienceQA
+dataset_name: ScienceQA-IMG
+task: "scienceqa_img"
+dataset_kwargs:
+  token: True
+test_split: test
+output_type: generate_until
+doc_to_visual: !function utils.sqa_doc_to_visual
+doc_to_text: !function utils.sqa_doc_to_text
+doc_to_target: !function utils.sqa_doc_to_target
+generation_kwargs:
+  max_new_tokens: 16
+  temperature: 0
+  do_sample: False
+metric_list:
+  - metric: exact_match
+    aggregation: mean
+    higher_is_better: true
+    ignore_case: true
+    ignore_punctuation: true
+process_results: !function utils.sqa_process_results
+metadata:
+  - version: 0.0
+
+model_specific_prompt_kwargs:
+  default:
+    pre_prompt: ""
+    post_prompt: "\nAnswer with the option's letter from the given choices directly."
+model_specific_generation_kwargs:
+  llava:
+    image_aspect_ratio: original
+