From 13179f9f36aba186e1bb4a1172922c76bafa922a Mon Sep 17 00:00:00 2001 From: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun, 3 Mar 2024 21:19:15 +0800 Subject: [PATCH] [Fix] wandb group logging missing columns (#61) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 9c0bc584841612c338f77ea5dfaa068c36d7456d Author: Zhang Peiyuan Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (#57) * add instructblip * minicpm_v * remove from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi Co-authored-by: kcz358 commit 30ab0cef298b0faf867f5570e50e5be367adae02 Author: Pu Fanyi Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '5e73e8b8a2408bd8193361788669ca80db19cb04' * Refactor ok_vqa_aggreate_submissions function * Merge commit '40099e8b8145bde513b9b7cef8461d8f13d1eafe' * Refactor VQA submission file saving * Update file utils * Merge commit 'a56fe11c00ad4a8b8967be88b93baef6649528c5' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder Co-authored-by: kcz358 commit a5b07ee6488f3889896e434789bdd2312bf8a251 Author: Li Bo Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li Co-authored-by: Fanyi Pu Co-authored-by: jzhang38 --- lmms_eval/logging_utils.py | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/lmms_eval/logging_utils.py b/lmms_eval/logging_utils.py index e4e2947e9..800dfcd1c 100644 --- a/lmms_eval/logging_utils.py +++ b/lmms_eval/logging_utils.py @@ -192,9 +192,15 @@ def make_table(columns: List[str], key: str = "results"): se = dic[m + "_stderr" + "," + f] if se != "N/A": se = "%.4f" % se - table.add_data(*[model_name, model_args, k, version, f, n, m, str(v), str(se)]) + data = [model_name, model_args, k, version, f, n, m, str(v), str(se)] + if key == "groups": + data = [self.group_names] + data + table.add_data(*data) else: - table.add_data(*[model_name, model_args, k, version, f, n, m, str(v), ""]) + data = [model_name, model_args, k, version, f, n, m, str(v), ""] + if key == "groups": + data = [self.group_names] + data + table.add_data(*data) return table