From 13179f9f36aba186e1bb4a1172922c76bafa922a Mon Sep 17 00:00:00 2001
From: kcz358 <92624596+kcz358@users.noreply.github.com>
Date: Sun, 3 Mar 2024 21:19:15 +0800
Subject: [PATCH] [Fix] wandb group logging missing columns (#61)

* Refactor logging in lmms_eval package

* Refactor variable names in lmms_eval package

* Update README.md with new features and installation instructions

* Update supported models and datasets

* Delete otter.py file

* Fix capitalization in README.md

* Update image sizes and add new features

* Refactor README.md to improve readability and add new features

* Add description for lmms-eval in README.md

* Update accelerator support in README.md

* Update lmms-eval README with improved description and additional features

* Update README.md with improved task grouping description

* change `Otter-AI/MME` to `lmms-lab/MME`

* Update README.md

* Update README.md

* Remove unused code in mme.yaml

* Squashed commit of the following:

commit 9c0bc584841612c338f77ea5dfaa068c36d7456d
Author: Zhang Peiyuan <a1286225768@gmail.com>
Date:   Thu Feb 29 13:40:02 2024 +0800

    Dev/py add models (#57)

    * add instructblip

    * minicpm_v

    * remove <image> from qwen-vl

    * speed up postprocessing

    * Optimize build context speed

    ---------

    Co-authored-by: Pu Fanyi <FPU001@e.ntu.edu.sg>
    Co-authored-by: kcz358 <kaichenzhang358@outlook.com>

commit 30ab0cef298b0faf867f5570e50e5be367adae02
Author: Pu Fanyi <FPU001@e.ntu.edu.sg>
Date:   Wed Feb 28 14:49:07 2024 +0800

    Pufanyi/flickr30k refractor (#56)

    * refactor vizwizvqa task

    * Delete vqav2_test and vqav2_val YAML files

    * Refactor vqav2_process_results functions

    * Add a pack for vqav2

    * refactor okvqa

    * roll back vizwiz_vqa

    * Fix exact_match calculation in ok_vqa_process_results

    * Update OKVQA dataset name in readme

    * add model_specific_prompt_kwargs

    * add model_specific_prompt_kwargs to vizwiz_vqa

    * add model_specific_prompt_kwargs for vqav2

    * lint

    * fix a small bug for eval_logger

    * Refactor make_table function to display points as "  -  " if value is None

    * Merge commit '5e73e8b8a2408bd8193361788669ca80db19cb04'

    * Refactor ok_vqa_aggreate_submissions function

    * Merge commit '40099e8b8145bde513b9b7cef8461d8f13d1eafe'

    * Refactor VQA submission file saving

    * Update file utils

    * Merge commit 'a56fe11c00ad4a8b8967be88b93baef6649528c5'

    * Refactor file path handling and submission generation

    * OKVQA path

    * vizwizvqa file

    * pack cmmmu

    * fix a small metric bug for cmmmu

    * Add higher_is_better flag to submission metric

    * Add CMMMU dataset to README.md

    * Add logging and refactor submission file generation in docvqa utils.py

    * pack docvqa

    * add traceback to print detailed error

    * Refactor docvqa_test_aggregate_results to accept additional arguments

    * Add metric check in evaluator.py and update test.yaml and val.yaml

    * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2

    * merge textvqa

    * textvqa

    * Modify submission file generation for COCO test results

    * Update test result storage path

    * update coco cap file name

    * Update COCO 2017 Caption dataset name

    * ferret

    * Add Ferret dataset

    * Refactor hb_doc_to_text function to include model-specific prompts

    * Add IconQA and its subtasks

    * Refactor image list creation in doc_to_visual function

    * Add process_results function to default template

    * Update process_results function in iconqa utils.py

    * refactor flickr30k

    * change aggregation function

    * Fix formatting issues and update logging message

    * Fix llava can not handle only text question (no visuals)

    * Fix qwen can not handle no image question (no visuals)

    * Add fuyu prepare accelerator scripts

    * refactor mme

    * naming consistency

    * aggregation_submissions consistency

    * flickr30k naming consistency

    * remove submissions for mme

    * remove unused submission function

    * Refactor infovqa_test.yaml and infovqa_val.yaml

    * Refactor code for improved readability and maintainability

    * stvqa

    * remane sqa

    * Update lmms_eval textcaps files and utils.py

    * Update default prompt for text captions

    * Refactor textcaps_aggregation_result function

    * Add generate_submission_file function and update mathvista_aggregate_results signature

    * Update nocaps_test.yaml and nocaps_val.yaml

    * refractor internal_eval

    * Add internal evaluation datasets

    * pack multidocvqa

    * mmvet

    * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating

    * Refractor llava wild

    * Refractor llava-bench-coco

    * Add JSON file generation for gpt evaluation details

    * mmmu

    * Remove MMBench English and Chinese tasks

    * Remove unnecessary return statement in mmbench_aggregate_test_results function

    * Fix distributed process group initialization

    * Update dataset paths and group names in mmbench test configs

    * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py

    * Add torch module import

    * lint

    * Remove IconQA dataset from README.md

    * Add Multi-DocVQA and its submodules

    * Add new datasets and update task names

    * Refactor flickr_aggregation_result function to accept additional arguments

    * Add timeout kwargs in Accelerator constructor

    * Add encoding to be utf-8 for cmmmu

    * Fix llava try and catch, remove torch.distributed.init in main

    * Ds prepare script for llava

    ---------

    Co-authored-by: JvThunder <joshuaadrianc@gmail.com>
    Co-authored-by: kcz358 <kaichenzhang358@outlook.com>

commit a5b07ee6488f3889896e434789bdd2312bf8a251
Author: Li Bo <drluodian@gmail.com>
Date:   Tue Feb 27 22:52:07 2024 +0800

    [Wandb Logger] add models, and args to wandb tables. (#55)

    * Refactor logging in lmms_eval package

    * Refactor variable names in lmms_eval package

* add llava main in pyproject

* Update README.md

* Remove unnecessary dependencies and add specific version for llava_repr

* Add dependencies for llava_repr***

* Update README.md

* add some docs on models and command line commands

* remove some lines

* typo

* Update model_guide.md

* Update model_guide.md

* Update README.md

* Update README.md

* Update README.md

* Fix refcocog dataset path

* Record gpt response in eval info

* Resolve conflict

* Fix hallusionbench gpt json saving path

* Rename hallubench gpt output path

* Change remove image to check by type instead of check by names

* More robust check by type

* Remove unnecessary img in data

* Forcing an empty commit.

* Testing

* Delete unnecessary things

* Fix seedbench2 image issue in doc_to_text

* Add conditional exclude for internal eval

* Fix small bugs in list_with_num

* Revise list_with_num model args

* Fix logging utils bug on wandb grouping

---------

Co-authored-by: Bo Li <drluodian@gmail.com>
Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
Co-authored-by: jzhang38 <a1286225768@gmail.com>
---
 lmms_eval/logging_utils.py | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/lmms_eval/logging_utils.py b/lmms_eval/logging_utils.py
index e4e2947e9..800dfcd1c 100644
--- a/lmms_eval/logging_utils.py
+++ b/lmms_eval/logging_utils.py
@@ -192,9 +192,15 @@ def make_table(columns: List[str], key: str = "results"):
                         se = dic[m + "_stderr" + "," + f]
                         if se != "N/A":
                             se = "%.4f" % se
-                        table.add_data(*[model_name, model_args, k, version, f, n, m, str(v), str(se)])
+                        data = [model_name, model_args, k, version, f, n, m, str(v), str(se)]
+                        if key == "groups":
+                            data = [self.group_names] + data
+                        table.add_data(*data)
                     else:
-                        table.add_data(*[model_name, model_args, k, version, f, n, m, str(v), ""])
+                        data = [model_name, model_args, k, version, f, n, m, str(v), ""]
+                        if key == "groups":
+                            data = [self.group_names] + data
+                        table.add_data(*data)
 
             return table