Skip to content

Commit

Permalink
[Fix] fix bugs (EvolvingLMMs-Lab#41)
Browse files Browse the repository at this point in the history
* add fuyu

* Merge commit '6d570ac1d98a03585c8119ccb362e13ab2172fed'

* Squashed commit of the following:

commit 09c64b7491cd19d4e6c4a6e1a38254eaa74d0032
Author: kcz358 <[email protected]>
Date:   Tue Jan 30 19:39:57 2024 +0800

    Add hallu bench

commit 6d570ac1d98a03585c8119ccb362e13ab2172fed
Author: Pu Fanyi <[email protected]>
Date:   Tue Jan 30 14:52:51 2024 +0800

    scienceqa for full set (#32)

    * Remove unused code and configuration file

    * Remove docvqa.yaml and update vizwizvqa.yaml

    * lint

    * Add dataset_kwargs to vizwizvqa.yaml

    * Add dataset_kwargs to vizwizvqa.yaml

    * textvqa (#27)

    * Update textvqa.yaml and utils.py

    * Fix YAML formatting in textvqa.yaml and remove unused files

    * remove useless matric

    * add textvqa val & test

    * Update progress bar description in evaluator.py

    * Update submission file names in VizWizVQA tasks

    * Update output path to include log samples suffix

    * Update submission file paths in OKVQA and VizWizVQA tasks

    * Refactor llava-in-the-wild.yaml and utils.py

    * Update metric for llava evaluation

    * Refactor logging message in Task class

    * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e'

    * Fix formatting issues and add progress bar closing statements

    * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml

    * Update tqdm progress bar in OtterHD model

    * Squashed commit of the following:

    commit eae210c3700a59b7d5cc9de46fcb855f443096aa
    Author: kcz358 <[email protected]>
    Date:   Sun Jan 28 09:46:19 2024 +0800

        Black lint

    commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae
    Merge: ab898e4 fb209e4
    Author: kcz358 <[email protected]>
    Date:   Sun Jan 28 09:45:31 2024 +0800

        Merge branch 'main' into kc/list_tasks_num

    commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed
    Author: kcz358 <[email protected]>
    Date:   Sun Jan 28 09:44:23 2024 +0800

        Enable list all tasks num

    commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f
    Author: kcz358 <[email protected]>
    Date:   Sun Jan 28 09:41:32 2024 +0800

        Exclude train yaml file in the task list

    commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e
    Author: Zhang Peiyuan <[email protected]>
    Date:   Sun Jan 28 02:04:57 2024 +0800

        Add InfoVQA, DocVQA, and QwenVL (#28)

        * add mmme

        * black

        * add model specific prompt and gen kwargs

        * black

        * add yaml config to supprot multi-model eval

        * print table at the end

        * refactor multi model code

        * add chartqa

        * black

        * add ai2d

        * black

        * update chartqa

        * blacl

        * update ai2d dataset

        * black

        * add qwenvl

        * add infovqa and docvqa

    * Fix error handling in loading YAML config files

    * Squashed commit of the following:

    commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8
    Author: kcz358 <[email protected]>
    Date:   Sun Jan 28 12:41:40 2024 +0800

        Fix key bugs

    commit eae210c3700a59b7d5cc9de46fcb855f443096aa
    Author: kcz358 <[email protected]>
    Date:   Sun Jan 28 09:46:19 2024 +0800

        Black lint

    commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae
    Merge: ab898e4 fb209e4
    Author: kcz358 <[email protected]>
    Date:   Sun Jan 28 09:45:31 2024 +0800

        Merge branch 'main' into kc/list_tasks_num

    commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed
    Author: kcz358 <[email protected]>
    Date:   Sun Jan 28 09:44:23 2024 +0800

        Enable list all tasks num

    commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f
    Author: kcz358 <[email protected]>
    Date:   Sun Jan 28 09:41:32 2024 +0800

        Exclude train yaml file in the task list

    commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e
    Author: Zhang Peiyuan <[email protected]>
    Date:   Sun Jan 28 02:04:57 2024 +0800

        Add InfoVQA, DocVQA, and QwenVL (#28)

        * add mmme

        * black

        * add model specific prompt and gen kwargs

        * black

        * add yaml config to supprot multi-model eval

        * print table at the end

        * refactor multi model code

        * add chartqa

        * black

        * add ai2d

        * black

        * update chartqa

        * blacl

        * update ai2d dataset

        * black

        * add qwenvl

        * add infovqa and docvqa

    * List task #num sorted

    * Update prompt messages for image-related tasks

    * Delete unused task configuration files

    * Remove coco_train.yaml configuration file

    * Update task name in mmmu.yaml

    * Fix error message for missing tasks

    * Add wandb import and integration

    * Update generation kwargs for LMMS tasks

    * Update lmms_eval MME task configuration and utils

    * Update generation_kwargs in lmms_eval tasks

    * Update doc_to_text function in coco and okvqa tasks

    * Add COCO 2017 version

    * Update task name in coco_test2017.yaml

    * Squashed commit of the following:

    commit fbb7aa57856f800d6c18413318830f4bbc6c8157
    Author: Zhang Peiyuan <[email protected]>
    Date:   Mon Jan 29 22:41:33 2024 +0800

        Add/mmmu test (#30)

        * mmmu_test

        * black

    commit b8ba33c2a349cb5b479e14af1a2d30f15ad53010
    Author: Li Bo <[email protected]>
    Date:   Sun Jan 28 22:19:13 2024 +0800

        [Dataset Check] dataset check and add wandb logging (#29)

        * Remove unused code and configuration file

        * Remove docvqa.yaml and update vizwizvqa.yaml

        * lint

        * Add dataset_kwargs to vizwizvqa.yaml

        * Add dataset_kwargs to vizwizvqa.yaml

        * textvqa (#27)

        * Update textvqa.yaml and utils.py

        * Fix YAML formatting in textvqa.yaml and remove unused files

        * remove useless matric

        * add textvqa val & test

        * Update progress bar description in evaluator.py

        * Update submission file names in VizWizVQA tasks

        * Update output path to include log samples suffix

        * Update submission file paths in OKVQA and VizWizVQA tasks

        * Refactor llava-in-the-wild.yaml and utils.py

        * Update metric for llava evaluation

        * Refactor logging message in Task class

        * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e'

        * Fix formatting issues and add progress bar closing statements

        * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml

        * Update tqdm progress bar in OtterHD model

        * Squashed commit of the following:

        commit eae210c3700a59b7d5cc9de46fcb855f443096aa
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:46:19 2024 +0800

            Black lint

        commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae
        Merge: ab898e4 fb209e4
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:45:31 2024 +0800

            Merge branch 'main' into kc/list_tasks_num

        commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:44:23 2024 +0800

            Enable list all tasks num

        commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:41:32 2024 +0800

            Exclude train yaml file in the task list

        commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e
        Author: Zhang Peiyuan <[email protected]>
        Date:   Sun Jan 28 02:04:57 2024 +0800

            Add InfoVQA, DocVQA, and QwenVL (#28)

            * add mmme

            * black

            * add model specific prompt and gen kwargs

            * black

            * add yaml config to supprot multi-model eval

            * print table at the end

            * refactor multi model code

            * add chartqa

            * black

            * add ai2d

            * black

            * update chartqa

            * blacl

            * update ai2d dataset

            * black

            * add qwenvl

            * add infovqa and docvqa

        * Fix error handling in loading YAML config files

        * Squashed commit of the following:

        commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 12:41:40 2024 +0800

            Fix key bugs

        commit eae210c3700a59b7d5cc9de46fcb855f443096aa
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:46:19 2024 +0800

            Black lint

        commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae
        Merge: ab898e4 fb209e4
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:45:31 2024 +0800

            Merge branch 'main' into kc/list_tasks_num

        commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:44:23 2024 +0800

            Enable list all tasks num

        commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:41:32 2024 +0800

            Exclude train yaml file in the task list

        commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e
        Author: Zhang Peiyuan <[email protected]>
        Date:   Sun Jan 28 02:04:57 2024 +0800

            Add InfoVQA, DocVQA, and QwenVL (#28)

            * add mmme

            * black

            * add model specific prompt and gen kwargs

            * black

            * add yaml config to supprot multi-model eval

            * print table at the end

            * refactor multi model code

            * add chartqa

            * black

            * add ai2d

            * black

            * update chartqa

            * blacl

            * update ai2d dataset

            * black

            * add qwenvl

            * add infovqa and docvqa

        * List task #num sorted

        * Update prompt messages for image-related tasks

        * Delete unused task configuration files

        * Remove coco_train.yaml configuration file

        * Update task name in mmmu.yaml

        * Fix error message for missing tasks

        * Add wandb import and integration

        ---------

        Co-authored-by: Fanyi Pu <[email protected]>
        Co-authored-by: kcz358 <[email protected]>

    * Remove scienceqa_img task configuration

    * eval scienceqa with no images

    ---------

    Co-authored-by: Bo Li <[email protected]>
    Co-authored-by: kcz358 <[email protected]>

* Update hb_doc_to_text function to remove unnecessary line break

* Add Fuyu model and update OtterHD model

* Refactor model response handling and fix image processing bug

* Refactor flatten method to support only getting the first element

* Add support for specifying timezone in datetime string

Update flatten method in OtterHD class

Update get_datetime_str function in utils.py

* Fix condition for checking wandb_args_dict in __main__.py

* Commented out assertions for batch size in Fuyu model

* Add warning message for existing output file

* Fix batch size issue in OtterHD model

* Squashed commit of the following:

commit 7dd84f337cf1ce906dfeb92118e6c2998707a79a
Author: Li Bo <[email protected]>
Date:   Wed Jan 31 16:00:22 2024 +0800

    [Datasets] add hallubench (#34)

    * Add hallu bench

    * Fix hall_b gpt eval bugs

    ---------

    Co-authored-by: kcz358 <[email protected]>

commit a781057ad07b0a60c7ef682f864be598b2436b7c
Author: Li Bo <[email protected]>
Date:   Wed Jan 31 14:23:15 2024 +0800

    [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33)

    * add fuyu

    * Merge commit '6d570ac1d98a03585c8119ccb362e13ab2172fed'

    * Squashed commit of the following:

    commit 09c64b7491cd19d4e6c4a6e1a38254eaa74d0032
    Author: kcz358 <[email protected]>
    Date:   Tue Jan 30 19:39:57 2024 +0800

        Add hallu bench

    commit 6d570ac1d98a03585c8119ccb362e13ab2172fed
    Author: Pu Fanyi <[email protected]>
    Date:   Tue Jan 30 14:52:51 2024 +0800

        scienceqa for full set (#32)

        * Remove unused code and configuration file

        * Remove docvqa.yaml and update vizwizvqa.yaml

        * lint

        * Add dataset_kwargs to vizwizvqa.yaml

        * Add dataset_kwargs to vizwizvqa.yaml

        * textvqa (#27)

        * Update textvqa.yaml and utils.py

        * Fix YAML formatting in textvqa.yaml and remove unused files

        * remove useless matric

        * add textvqa val & test

        * Update progress bar description in evaluator.py

        * Update submission file names in VizWizVQA tasks

        * Update output path to include log samples suffix

        * Update submission file paths in OKVQA and VizWizVQA tasks

        * Refactor llava-in-the-wild.yaml and utils.py

        * Update metric for llava evaluation

        * Refactor logging message in Task class

        * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e'

        * Fix formatting issues and add progress bar closing statements

        * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml

        * Update tqdm progress bar in OtterHD model

        * Squashed commit of the following:

        commit eae210c3700a59b7d5cc9de46fcb855f443096aa
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:46:19 2024 +0800

            Black lint

        commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae
        Merge: ab898e4 fb209e4
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:45:31 2024 +0800

            Merge branch 'main' into kc/list_tasks_num

        commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:44:23 2024 +0800

            Enable list all tasks num

        commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:41:32 2024 +0800

            Exclude train yaml file in the task list

        commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e
        Author: Zhang Peiyuan <[email protected]>
        Date:   Sun Jan 28 02:04:57 2024 +0800

            Add InfoVQA, DocVQA, and QwenVL (#28)

            * add mmme

            * black

            * add model specific prompt and gen kwargs

            * black

            * add yaml config to supprot multi-model eval

            * print table at the end

            * refactor multi model code

            * add chartqa

            * black

            * add ai2d

            * black

            * update chartqa

            * blacl

            * update ai2d dataset

            * black

            * add qwenvl

            * add infovqa and docvqa

        * Fix error handling in loading YAML config files

        * Squashed commit of the following:

        commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 12:41:40 2024 +0800

            Fix key bugs

        commit eae210c3700a59b7d5cc9de46fcb855f443096aa
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:46:19 2024 +0800

            Black lint

        commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae
        Merge: ab898e4 fb209e4
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:45:31 2024 +0800

            Merge branch 'main' into kc/list_tasks_num

        commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:44:23 2024 +0800

            Enable list all tasks num

        commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:41:32 2024 +0800

            Exclude train yaml file in the task list

        commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e
        Author: Zhang Peiyuan <[email protected]>
        Date:   Sun Jan 28 02:04:57 2024 +0800

            Add InfoVQA, DocVQA, and QwenVL (#28)

            * add mmme

            * black

            * add model specific prompt and gen kwargs

            * black

            * add yaml config to supprot multi-model eval

            * print table at the end

            * refactor multi model code

            * add chartqa

            * black

            * add ai2d

            * black

            * update chartqa

            * blacl

            * update ai2d dataset

            * black

            * add qwenvl

            * add infovqa and docvqa

        * List task #num sorted

        * Update prompt messages for image-related tasks

        * Delete unused task configuration files

        * Remove coco_train.yaml configuration file

        * Update task name in mmmu.yaml

        * Fix error message for missing tasks

        * Add wandb import and integration

        * Update generation kwargs for LMMS tasks

        * Update lmms_eval MME task configuration and utils

        * Update generation_kwargs in lmms_eval tasks

        * Update doc_to_text function in coco and okvqa tasks

        * Add COCO 2017 version

        * Update task name in coco_test2017.yaml

        * Squashed commit of the following:

        commit fbb7aa57856f800d6c18413318830f4bbc6c8157
        Author: Zhang Peiyuan <[email protected]>
        Date:   Mon Jan 29 22:41:33 2024 +0800

            Add/mmmu test (#30)

            * mmmu_test

            * black

        commit b8ba33c2a349cb5b479e14af1a2d30f15ad53010
        Author: Li Bo <[email protected]>
        Date:   Sun Jan 28 22:19:13 2024 +0800

            [Dataset Check] dataset check and add wandb logging (#29)

            * Remove unused code and configuration file

            * Remove docvqa.yaml and update vizwizvqa.yaml

            * lint

            * Add dataset_kwargs to vizwizvqa.yaml

            * Add dataset_kwargs to vizwizvqa.yaml

            * textvqa (#27)

            * Update textvqa.yaml and utils.py

            * Fix YAML formatting in textvqa.yaml and remove unused files

            * remove useless matric

            * add textvqa val & test

            * Update progress bar description in evaluator.py

            * Update submission file names in VizWizVQA tasks

            * Update output path to include log samples suffix

            * Update submission file paths in OKVQA and VizWizVQA tasks

            * Refactor llava-in-the-wild.yaml and utils.py

            * Update metric for llava evaluation

            * Refactor logging message in Task class

            * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e'

            * Fix formatting issues and add progress bar closing statements

            * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml

            * Update tqdm progress bar in OtterHD model

            * Squashed commit of the following:

            commit eae210c3700a59b7d5cc9de46fcb855f443096aa
            Author: kcz358 <[email protected]>
            Date:   Sun Jan 28 09:46:19 2024 +0800

                Black lint

            commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae
            Merge: ab898e4 fb209e4
            Author: kcz358 <[email protected]>
            Date:   Sun Jan 28 09:45:31 2024 +0800

                Merge branch 'main' into kc/list_tasks_num

            commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed
            Author: kcz358 <[email protected]>
            Date:   Sun Jan 28 09:44:23 2024 +0800

                Enable list all tasks num

            commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f
            Author: kcz358 <[email protected]>
            Date:   Sun Jan 28 09:41:32 2024 +0800

                Exclude train yaml file in the task list

            commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e
            Author: Zhang Peiyuan <[email protected]>
            Date:   Sun Jan 28 02:04:57 2024 +0800

                Add InfoVQA, DocVQA, and QwenVL (#28)

                * add mmme

                * black

                * add model specific prompt and gen kwargs

                * black

                * add yaml config to supprot multi-model eval

                * print table at the end

                * refactor multi model code

                * add chartqa

                * black

                * add ai2d

                * black

                * update chartqa

                * blacl

                * update ai2d dataset

                * black

                * add qwenvl

                * add infovqa and docvqa

            * Fix error handling in loading YAML config files

            * Squashed commit of the following:

            commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8
            Author: kcz358 <[email protected]>
            Date:   Sun Jan 28 12:41:40 2024 +0800

                Fix key bugs

            commit eae210c3700a59b7d5cc9de46fcb855f443096aa
            Author: kcz358 <[email protected]>
            Date:   Sun Jan 28 09:46:19 2024 +0800

                Black lint

            commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae
            Merge: ab898e4 fb209e4
            Author: kcz358 <[email protected]>
            Date:   Sun Jan 28 09:45:31 2024 +0800

                Merge branch 'main' into kc/list_tasks_num

            commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed
            Author: kcz358 <[email protected]>
            Date:   Sun Jan 28 09:44:23 2024 +0800

                Enable list all tasks num

            commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f
            Author: kcz358 <[email protected]>
            Date:   Sun Jan 28 09:41:32 2024 +0800

                Exclude train yaml file in the task list

            commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e
            Author: Zhang Peiyuan <[email protected]>
            Date:   Sun Jan 28 02:04:57 2024 +0800

                Add InfoVQA, DocVQA, and QwenVL (#28)

                * add mmme

                * black

                * add model specific prompt and gen kwargs

                * black

                * add yaml config to supprot multi-model eval

                * print table at the end

                * refactor multi model code

                * add chartqa

                * black

                * add ai2d

                * black

                * update chartqa

                * blacl

                * update ai2d dataset

                * black

                * add qwenvl

                * add infovqa and docvqa

            * List task #num sorted

            * Update prompt messages for image-related tasks

            * Delete unused task configuration files

            * Remove coco_train.yaml configuration file

            * Update task name in mmmu.yaml

            * Fix error message for missing tasks

            * Add wandb import and integration

            ---------

            Co-authored-by: Fanyi Pu <[email protected]>
            Co-authored-by: kcz358 <[email protected]>

        * Remove scienceqa_img task configuration

        * eval scienceqa with no images

        ---------

        Co-authored-by: Bo Li <[email protected]>
        Co-authored-by: kcz358 <[email protected]>

    * Update hb_doc_to_text function to remove unnecessary line break

    * Add Fuyu model and update OtterHD model

    * Refactor model response handling and fix image processing bug

    * Refactor flatten method to support only getting the first element

    * Add support for specifying timezone in datetime string

    Update flatten method in OtterHD class

    Update get_datetime_str function in utils.py

    * Fix condition for checking wandb_args_dict in __main__.py

    * Commented out assertions for batch size in Fuyu model

    * Add warning message for existing output file

commit 6d570ac1d98a03585c8119ccb362e13ab2172fed
Author: Pu Fanyi <[email protected]>
Date:   Tue Jan 30 14:52:51 2024 +0800

    scienceqa for full set (#32)

    * Remove unused code and configuration file

    * Remove docvqa.yaml and update vizwizvqa.yaml

    * lint

    * Add dataset_kwargs to vizwizvqa.yaml

    * Add dataset_kwargs to vizwizvqa.yaml

    * textvqa (#27)

    * Update textvqa.yaml and utils.py

    * Fix YAML formatting in textvqa.yaml and remove unused files

    * remove useless matric

    * add textvqa val & test

    * Update progress bar description in evaluator.py

    * Update submission file names in VizWizVQA tasks

    * Update output path to include log samples suffix

    * Update submission file paths in OKVQA and VizWizVQA tasks

    * Refactor llava-in-the-wild.yaml and utils.py

    * Update metric for llava evaluation

    * Refactor logging message in Task class

    * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e'

    * Fix formatting issues and add progress bar closing statements

    * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml

    * Update tqdm progress bar in OtterHD model

    * Squashed commit of the following:

    commit eae210c3700a59b7d5cc9de46fcb855f443096aa
    Author: kcz358 <[email protected]>
    Date:   Sun Jan 28 09:46:19 2024 +0800

        Black lint

    commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae
    Merge: ab898e4 fb209e4
    Author: kcz358 <[email protected]>
    Date:   Sun Jan 28 09:45:31 2024 +0800

        Merge branch 'main' into kc/list_tasks_num

    commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed
    Author: kcz358 <[email protected]>
    Date:   Sun Jan 28 09:44:23 2024 +0800

        Enable list all tasks num

    commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f
    Author: kcz358 <[email protected]>
    Date:   Sun Jan 28 09:41:32 2024 +0800

        Exclude train yaml file in the task list

    commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e
    Author: Zhang Peiyuan <[email protected]>
    Date:   Sun Jan 28 02:04:57 2024 +0800

        Add InfoVQA, DocVQA, and QwenVL (#28)

        * add mmme

        * black

        * add model specific prompt and gen kwargs

        * black

        * add yaml config to supprot multi-model eval

        * print table at the end

        * refactor multi model code

        * add chartqa

        * black

        * add ai2d

        * black

        * update chartqa

        * blacl

        * update ai2d dataset

        * black

        * add qwenvl

        * add infovqa and docvqa

    * Fix error handling in loading YAML config files

    * Squashed commit of the following:

    commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8
    Author: kcz358 <[email protected]>
    Date:   Sun Jan 28 12:41:40 2024 +0800

        Fix key bugs

    commit eae210c3700a59b7d5cc9de46fcb855f443096aa
    Author: kcz358 <[email protected]>
    Date:   Sun Jan 28 09:46:19 2024 +0800

        Black lint

    commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae
    Merge: ab898e4 fb209e4
    Author: kcz358 <[email protected]>
    Date:   Sun Jan 28 09:45:31 2024 +0800

        Merge branch 'main' into kc/list_tasks_num

    commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed
    Author: kcz358 <[email protected]>
    Date:   Sun Jan 28 09:44:23 2024 +0800

        Enable list all tasks num

    commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f
    Author: kcz358 <[email protected]>
    Date:   Sun Jan 28 09:41:32 2024 +0800

        Exclude train yaml file in the task list

    commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e
    Author: Zhang Peiyuan <[email protected]>
    Date:   Sun Jan 28 02:04:57 2024 +0800

        Add InfoVQA, DocVQA, and QwenVL (#28)

        * add mmme

        * black

        * add model specific prompt and gen kwargs

        * black

        * add yaml config to supprot multi-model eval

        * print table at the end

        * refactor multi model code

        * add chartqa

        * black

        * add ai2d

        * black

        * update chartqa

        * blacl

        * update ai2d dataset

        * black

        * add qwenvl

        * add infovqa and docvqa

    * List task #num sorted

    * Update prompt messages for image-related tasks

    * Delete unused task configuration files

    * Remove coco_train.yaml configuration file

    * Update task name in mmmu.yaml

    * Fix error message for missing tasks

    * Add wandb import and integration

    * Update generation kwargs for LMMS tasks

    * Update lmms_eval MME task configuration and utils

    * Update generation_kwargs in lmms_eval tasks

    * Update doc_to_text function in coco and okvqa tasks

    * Add COCO 2017 version

    * Update task name in coco_test2017.yaml

    * Squashed commit of the following:

    commit fbb7aa57856f800d6c18413318830f4bbc6c8157
    Author: Zhang Peiyuan <[email protected]>
    Date:   Mon Jan 29 22:41:33 2024 +0800

        Add/mmmu test (#30)

        * mmmu_test

        * black

    commit b8ba33c2a349cb5b479e14af1a2d30f15ad53010
    Author: Li Bo <[email protected]>
    Date:   Sun Jan 28 22:19:13 2024 +0800

        [Dataset Check] dataset check and add wandb logging (#29)

        * Remove unused code and configuration file

        * Remove docvqa.yaml and update vizwizvqa.yaml

        * lint

        * Add dataset_kwargs to vizwizvqa.yaml

        * Add dataset_kwargs to vizwizvqa.yaml

        * textvqa (#27)

        * Update textvqa.yaml and utils.py

        * Fix YAML formatting in textvqa.yaml and remove unused files

        * remove useless matric

        * add textvqa val & test

        * Update progress bar description in evaluator.py

        * Update submission file names in VizWizVQA tasks

        * Update output path to include log samples suffix

        * Update submission file paths in OKVQA and VizWizVQA tasks

        * Refactor llava-in-the-wild.yaml and utils.py

        * Update metric for llava evaluation

        * Refactor logging message in Task class

        * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e'

        * Fix formatting issues and add progress bar closing statements

        * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml

        * Update tqdm progress bar in OtterHD model

        * Squashed commit of the following:

        commit eae210c3700a59b7d5cc9de46fcb855f443096aa
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:46:19 2024 +0800

            Black lint

        commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae
        Merge: ab898e4 fb209e4
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:45:31 2024 +0800

            Merge branch 'main' into kc/list_tasks_num

        commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:44:23 2024 +0800

            Enable list all tasks num

        commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:41:32 2024 +0800

            Exclude train yaml file in the task list

        commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e
        Author: Zhang Peiyuan <[email protected]>
        Date:   Sun Jan 28 02:04:57 2024 +0800

            Add InfoVQA, DocVQA, and QwenVL (#28)

            * add mmme

            * black

            * add model specific prompt and gen kwargs

            * black

            * add yaml config to supprot multi-model eval

            * print table at the end

            * refactor multi model code

            * add chartqa

            * black

            * add ai2d

            * black

            * update chartqa

            * blacl

            * update ai2d dataset

            * black

            * add qwenvl

            * add infovqa and docvqa

        * Fix error handling in loading YAML config files

        * Squashed commit of the following:

        commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 12:41:40 2024 +0800

            Fix key bugs

        commit eae210c3700a59b7d5cc9de46fcb855f443096aa
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:46:19 2024 +0800

            Black lint

        commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae
        Merge: ab898e4 fb209e4
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:45:31 2024 +0800

            Merge branch 'main' into kc/list_tasks_num

        commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:44:23 2024 +0800

            Enable list all tasks num

        commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:41:32 2024 +0800

            Exclude train yaml file in the task list

        commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e
        Author: Zhang Peiyuan <[email protected]>
        Date:   Sun Jan 28 02:04:57 2024 +0800

            Add InfoVQA, DocVQA, and QwenVL (#28)

            * add mmme

            * black

            * add model specific prompt and gen kwargs

            * black

            * add yaml config to supprot multi-model eval

            * print table at the end

            * refactor multi model code

            * add chartqa

            * black

            * add ai2d

            * black

            * update chartqa

            * blacl

            * update ai2d dataset

            * black

            * add qwenvl

            * add infovqa and docvqa

        * List task #num sorted

        * Update prompt messages for image-related tasks

        * Delete unused task configuration files

        * Remove coco_train.yaml configuration file

        * Update task name in mmmu.yaml

        * Fix error message for missing tasks

        * Add wandb import and integration

        ---------

        Co-authored-by: Fanyi Pu <[email protected]>
        Co-authored-by: kcz358 <[email protected]>

    * Remove scienceqa_img task configuration

    * eval scienceqa with no images

    ---------

    Co-authored-by: Bo Li <[email protected]>
    Co-authored-by: kcz358 <[email protected]>

* Update API configuration and file paths

* Refactor evaluate_by_chatgpt function in utils.py

* Add hallusion_output_vd_model.json to .gitignore

* Add timeout to API request

* Refactor file path generation and remove unnecessary suffix in log samples output names

* Refactor code and add output path handling

* Update lmms-eval API and add new models and datasets

* Refactor directory structure for RefCOCO+ and RefCOCOg datasets

* Fix error logging in get_eval and parse_score functions

* Update .gitignore and mme.yaml

* Squashed commit of the following:

commit 04a4076120c4d337d70992b82bf2b4fa4c700359
Author: jzhang38 <[email protected]>
Date:   Fri Feb 2 13:43:28 2024 +0800

    black

commit b3c423a93d944a2621c1fa4192616af048e5b77c
Author: jzhang38 <[email protected]>
Date:   Fri Feb 2 13:42:03 2024 +0800

    adapt qwen to sqa, gqa, ai2d, docvqa

commit c3b0da62994f646141456b60baaa3ee5713f38fa
Author: Li Bo <[email protected]>
Date:   Thu Feb 1 16:20:27 2024 +0800

    [Dataset] fix hallusion benchmark, add saving logic inside aggregate function (#35)

    * add fuyu

    * Merge commit '6d570ac1d98a03585c8119ccb362e13ab2172fed'

    * Squashed commit of the following:

    commit 09c64b7491cd19d4e6c4a6e1a38254eaa74d0032
    Author: kcz358 <[email protected]>
    Date:   Tue Jan 30 19:39:57 2024 +0800

        Add hallu bench

    commit 6d570ac1d98a03585c8119ccb362e13ab2172fed
    Author: Pu Fanyi <[email protected]>
    Date:   Tue Jan 30 14:52:51 2024 +0800

        scienceqa for full set (#32)

        * Remove unused code and configuration file

        * Remove docvqa.yaml and update vizwizvqa.yaml

        * lint

        * Add dataset_kwargs to vizwizvqa.yaml

        * Add dataset_kwargs to vizwizvqa.yaml

        * textvqa (#27)

        * Update textvqa.yaml and utils.py

        * Fix YAML formatting in textvqa.yaml and remove unused files

        * remove useless matric

        * add textvqa val & test

        * Update progress bar description in evaluator.py

        * Update submission file names in VizWizVQA tasks

        * Update output path to include log samples suffix

        * Update submission file paths in OKVQA and VizWizVQA tasks

        * Refactor llava-in-the-wild.yaml and utils.py

        * Update metric for llava evaluation

        * Refactor logging message in Task class

        * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e'

        * Fix formatting issues and add progress bar closing statements

        * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml

        * Update tqdm progress bar in OtterHD model

        * Squashed commit of the following:

        commit eae210c3700a59b7d5cc9de46fcb855f443096aa
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:46:19 2024 +0800

            Black lint

        commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae
        Merge: ab898e4 fb209e4
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:45:31 2024 +0800

            Merge branch 'main' into kc/list_tasks_num

        commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:44:23 2024 +0800

            Enable list all tasks num

        commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:41:32 2024 +0800

            Exclude train yaml file in the task list

        commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e
        Author: Zhang Peiyuan <[email protected]>
        Date:   Sun Jan 28 02:04:57 2024 +0800

            Add InfoVQA, DocVQA, and QwenVL (#28)

            * add mmme

            * black

            * add model specific prompt and gen kwargs

            * black

            * add yaml config to supprot multi-model eval

            * print table at the end

            * refactor multi model code

            * add chartqa

            * black

            * add ai2d

            * black

            * update chartqa

            * blacl

            * update ai2d dataset

            * black

            * add qwenvl

            * add infovqa and docvqa

        * Fix error handling in loading YAML config files

        * Squashed commit of the following:

        commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 12:41:40 2024 +0800

            Fix key bugs

        commit eae210c3700a59b7d5cc9de46fcb855f443096aa
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:46:19 2024 +0800

            Black lint

        commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae
        Merge: ab898e4 fb209e4
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:45:31 2024 +0800

            Merge branch 'main' into kc/list_tasks_num

        commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:44:23 2024 +0800

            Enable list all tasks num

        commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f
        Author: kcz358 <[email protected]>
        Date:   Sun Jan 28 09:41:32 2024 +0800

            Exclude train yaml file in the task list

        commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e
        Author: Zhang Peiyuan <[email protected]>
        Date:   Sun Jan 28 02:04:57 2024 +0800

            Add InfoVQA, DocVQA, and QwenVL (#28)

            * add mmme

            * black

            * add model specific prompt and gen kwargs

            * black

            * add yaml config to supprot multi-model eval

            * print table at the end

            * refactor multi model code

            * add chartqa

            * black

            * add ai2d

            * black

            * update chartqa

            * blacl

            * update ai2d dataset

            * black

            * add qwenvl

            * add infovqa and docvqa

        * List task #num sorted

        * Update prompt messages for image-related tasks

        * Delete unused task configuration files

        * Remove coco_train.yaml configuration file

        * Update task name in mmmu.yaml

        * Fix error message for missing tasks

        * Add wandb import and integration

        * Update generation kwargs for LMMS tasks

        * Update lmms_eval MME task configuration and utils

        * Update generation_kwargs in lmms_eval tasks

        * Update doc_to_text function in coco and okvqa tasks

        * Add COCO 2017 version

        * Update task name in coco_test2017.yaml

        * Squashed commit of the following:

        commit fbb7aa57856f800d6c18413318830f4bbc6c8157
        Author: Zhang Peiyuan <[email protected]>
        Date:   Mon Jan 29 22:41:33 2024 +0800

            Add/mmmu test (#30)

            * mmmu_test

            * black

        commit b8ba33c2a349cb5b479e14af1a2d30f15ad53010
        Author: Li Bo <[email protected]>
        Date:   Sun Jan 28 22:19:13 2024 +0800

            [Dataset Check] dataset check and add wandb logging (#29)

            * Remove unused code and configuration file

            * Remove docvqa.yaml and update vizwizvqa.yaml

            * lint

            * Add dataset_kwargs to vizwizvqa.yaml

            * Add dataset_kwargs to vizwizvqa.yaml

            * textvqa (#27)

            * Update textvqa.yaml and utils.py

            * Fix YAML formatting in textvqa.yaml and remove unused files

            * remove useless matric

            * add textvqa val & test

            * Update progress bar description in evaluator.py

            * Update submission file names in VizWizVQA tasks

            * Update output path to include log samples suffix

            * Update submission file paths in OKVQA and VizWizVQA tasks

            * Refactor llava-in-the-wild.yaml and utils.py

            * Update metric for llava evaluation

            * Refactor logging message in Task class

            * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e'

            * Fix formatting issues and add progress bar closing statements

            * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml

            * Update tqdm progress bar in OtterHD model

            * Squashed commit of the following:

            commit eae210c3700a59b7d5cc9de46fcb855f443096aa
            Author: kcz358 <[email protected]>
            Date:   Sun Jan 28 09:46:19 2024 +0800

                Black lint

            commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae
            Merge: ab898e4 fb209e4
            Author: kcz358 <[email protected]>
            Date:   Sun Jan 28 09:45:31 2024 +0800

                Merge branch 'main' into kc/list_tasks_num

            commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed
            Author: kcz358 <[email protected]>
            Date:   Sun Jan 28 09:44:23 2024 +0800

                Enable list all tasks num

            commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f
            Author: kcz358 <[email protected]>
            Date:   Sun Jan 28 09:41:32 2024 +0800

                Exclude train yaml file in the task list

            commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e
            Author: Zhang Peiyuan <[email protected]>
            Date:   Sun Jan 28 02:04:57 2024 +0800

                Add InfoVQA, DocVQA, and QwenVL (#28)

                * add mmme

                * black

                * add model specific prompt and gen kwargs

                * black

                * add yaml config to supprot multi-model eval

                * print table at the end

                * refactor multi model code

                * add chartqa

                * black

                * add ai2d

                * black

                * update chartqa

                * blacl

                * update ai2d dataset

                * black

                * add qwenvl

                * add infovqa and docvqa

            * Fix error handling in loading YAML config files

            * Squashed commit of the following:

            commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8
            Author: kcz358 <[email protected]>
            Date:   Sun Jan 28 12:41:40 2024 +0800

                Fix key bugs

            commit eae210c3700a59b7d5cc9de46fcb855f443096aa
            Author: kcz358 <[email protected]>
            Date:   Sun Jan 28 09:46:19 2024 +0800

                Black lint

            commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae
            Merge: ab898e4 fb209e4
            Author: kcz358 <[email protected]>
            Date:   Sun Jan 28 09:45:31 2024 +0800

                Merge branch 'main' into kc/list_tasks_num

            commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed
            Author: kcz358 <[email protected]>
            Date:   Sun Jan 28 09:44:23 2024 +0800

                Enable list all tasks num

            commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f
            Author: kcz358 <[email protected]>
            Date:   Sun Jan 28 09:41:32 2024 +0800

                Exclude train yaml file in the task list

            commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e
            Author: Zhang Peiyuan <[email protected]>
            Date:   Sun Jan 28 02:04:57 2024 +0800

                Add InfoVQA, DocVQA, and QwenVL (#28)

                * add mmme

                * black

                * add model specific prompt and gen kwargs

                * black

                * add yaml config to supprot multi-model eval

                * print table at the end

                * refactor multi model code

                * add chartqa

                * black

                * add ai2d

                * black

                * update chartqa

                * blacl

                * update ai2d dataset

                * black

                * add qwenvl

                * add infovqa and docvqa

            * List task #num sorted

            * Update prompt messages for image-related tasks

            * Delete unused task configuration files

            * Remove coco_train.yaml configuration file

            * Update task name in mmmu.yaml

            * Fix error message for missing tasks

            * Add wandb import and integration

            ---------

            Co-authored-by: Fanyi Pu <[email protected]>
            Co-authored-by: kcz358 <[email protected]>

        * Remove scienceqa_img task configuration

        * eval scienceqa with no images

        ---------

        Co-authored-by: Bo Li <[email protected]>
        Co-authored-by: kcz358 <[email protected]>

    * Update hb_doc_to_text function to remove unnecessary line break

    * Add Fuyu model and update OtterHD model

    * Refactor model response handling and fix image processing bug

    * Refactor flatten method to support only getting the first element

    * Add support for specifying timezone in datetime string

    Update flatten method in OtterHD class

    Update get_datetime_str function in utils.py

    * Fix condition for checking wandb_args_dict in __main__.py

    * Commented out assertions for batch size in Fuyu model

    * Add warning message for existing output file

    * Fix batch size issue in OtterHD model

    * Squashed commit of the following:

    commit 7dd84f337cf1ce906dfeb92118e6c2998707a79a
    Author: Li Bo <[email protected]>
    Date:   Wed Jan 31 16:00:22 2024 +0800

        [Datasets] add hallubench (#34)

        * Add hallu bench

        * Fix hall_b gpt eval bugs

        ---------

        Co-authored-by: kcz358 <[email protected]>

    commit a781057ad07b0a60c7ef682f864be598b2436b7c
    Author: Li Bo <[email protected]>
    Date:   Wed Jan 31 14:23:15 2024 +0800

        [Datasets & Models] Fuyu, HalluBench (w/Kaichen, commit 96d95b3) (#33)

        * add fuyu

        * Merge commit '6d570ac1d98a03585c8119ccb362e13ab2172fed'

        * Squashed commit of the following:

        commit 09c64b7491cd19d4e6c4a6e1a38254eaa74d0032
        Author: kcz358 <[email protected]>
        Date:   Tue Jan 30 19:39:57 2024 +0800

            Add hallu bench

        commit 6d570ac1d98a03585c8119ccb362e13ab2172fed
        Author: Pu Fanyi <[email protected]>
        Date:   Tue Jan 30 14:52:51 2024 +0800

            scienceqa for full set (#32)

            * Remove unused code and configuration file

            * Remove docvqa.yaml and update vizwizvqa.yaml

            * lint

            * Add dataset_kwargs to vizwizvqa.yaml

            * Add dataset_kwargs to vizwizvqa.yaml

            * textvqa (#27)

            * Update textvqa.yaml and utils.py

            * Fix YAML formatting in textvqa.yaml and remove unused files

            * remove useless matric

            * add textvqa val & test

            * Update progress bar description in evaluator.py

            * Update submission file names in VizWizVQA tasks

            * Update output path to include log samples suffix

            * Update submission file paths in OKVQA and VizWizVQA tasks

            * Refactor llava-in-the-wild.yaml and utils.py

            * Update metric for llava evaluation

            * Refactor logging message in Task class

            * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e'

            * Fix formatting issues and add progress bar closing statements

            * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml

            * Update tqdm progress bar in OtterHD model

            * Squashed commit of the following:

            commit eae210c3700a59b7d5cc9de46fcb855f443096aa
            Author: kcz358 <[email protected]>
            Date:   Sun Jan 28 09:46:19 2024 +0800

                Black lint

            commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae
            Merge: ab898e4 fb209e4
            Author: kcz358 <[email protected]>
            Date:   Sun Jan 28 09:45:31 2024 +0800

                Merge branch 'main' into kc/list_tasks_num

            commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed
            Author: kcz358 <[email protected]>
            Date:   Sun Jan 28 09:44:23 2024 +0800

                Enable list all tasks num

            commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f
            Author: kcz358 <[email protected]>
            Date:   Sun Jan 28 09:41:32 2024 +0800

                Exclude train yaml file in the task list

            commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e
            Author: Zhang Peiyuan <[email protected]>
            Date:   Sun Jan 28 02:04:57 2024 +0800

                Add InfoVQA, DocVQA, and QwenVL (#28)

                * add mmme

                * black

                * add model specific prompt and gen kwargs

                * black

                * add yaml config to supprot multi-model eval

                * print table at the end

                * refactor multi model code

                * add chartqa

                * black

                * add ai2d

                * black

                * update chartqa

                * blacl

                * update ai2d dataset

                * black

                * add qwenvl

                * add infovqa and docvqa

            * Fix error handling in loading YAML config files

            * Squashed commit of the following:

            commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8
            Author: kcz358 <[email protected]>
            Date:   Sun Jan 28 12:41:40 2024 +0800

                Fix key bugs

            commit eae210c3700a59b7d5cc9de46fcb855f443096aa
            Author: kcz358 <[email protected]>
            Date:   Sun Jan 28 09:46:19 2024 +0800

                Black lint

            commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae
            Merge: ab898e4 fb209e4
            Author: kcz358 <[email protected]>
            Date:   Sun Jan 28 09:45:31 2024 +0800

                Merge branch 'main' into kc/list_tasks_num

            commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed
            Author: kcz358 <[email protected]>
            Date:   Sun Jan 28 09:44:23 2024 +0800

                Enable list all tasks num

            commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f
            Author: kcz358 <[email protected]>
            Date:   Sun Jan 28 09:41:32 2024 +0800

                Exclude train yaml file in the task list

            commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e
            Author: Zhang Peiyuan <[email protected]>
            Date:   Sun Jan 28 02:04:57 2024 +0800

                Add InfoVQA, DocVQA, and QwenVL (#28)

                * add mmme

                * black

                * add model specific prompt and gen kwargs

                * black

                * add yaml config to supprot multi-model eval

                * print table at the end

                * refactor multi model code

                * add chartqa

                * black

                * add ai2d

                * black

                * update chartqa

                * blacl

                * update ai2d dataset

                * black

                * add qwenvl

                * add infovqa and docvqa

            * List task #num sorted

            * Update prompt messages for image-related tasks

            * Delete unused task configuration files

            * Remove coco_train.yaml configuration file

            * Update task name in mmmu.yaml

            * Fix error message for missing tasks

            * Add wandb import and integration

            * Update generation kwargs for LMMS tasks

            * Update lmms_eval MME task configuration and utils

            * Update generation_kwargs in lmms_eval tasks

            * Update doc_to_text function in coco and okvqa tasks

            * Add COCO 2017 version

            * Update task name in coco_test2017.yaml

            * Squashed commit of the following:

            commit fbb7aa57856f800d6c18413318830f4bbc6c8157
            Author: Zhang Peiyuan <[email protected]>
            Date:   Mon Jan 29 22:41:33 2024 +0800

                Add/mmmu test (#30)

                * mmmu_test

                * black

            commit b8ba33c2a349cb5b479e14af1a2d30f15ad53010
            Author: Li Bo <[email protected]>
            Date:   Sun Jan 28 22:19:13 2024 +0800

                [Dataset Check] dataset check and add wandb logging (#29)

                * Remove unused code and configuration file

                * Remove docvqa.yaml and update vizwizvqa.yaml

                * lint

                * Add dataset_kwargs to vizwizvqa.yaml

                * Add dataset_kwargs to vizwizvqa.yaml

                * textvqa (#27)

                * Update textvqa.yaml and utils.py

                * Fix YAML formatting in textvqa.yaml and remove unused files

                * remove useless matric

                * add textvqa val & test

                * Update progress bar description in evaluator.py

                * Update submission file names in VizWizVQA tasks

                * Update output path to include log samples suffix

                * Update submission file paths in OKVQA and VizWizVQA tasks

                * Refactor llava-in-the-wild.yaml and utils.py

                * Update metric for llava evaluation

                * Refactor logging message in Task class

                * Merge commit 'f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e'

                * Fix formatting issues and add progress bar closing statements

                * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml

                * Update tqdm progress bar in OtterHD model

                * Squashed commit of the following:

                commit eae210c3700a59b7d5cc9de46fcb855f443096aa
                Author: kcz358 <[email protected]>
                Date:   Sun Jan 28 09:46:19 2024 +0800

                    Black lint

                commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae
                Merge: ab898e4 fb209e4
                Author: kcz358 <[email protected]>
                Date:   Sun Jan 28 09:45:31 2024 +0800

                    Merge branch 'main' into kc/list_tasks_num

                commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed
                Author: kcz358 <[email protected]>
                Date:   Sun Jan 28 09:44:23 2024 +0800

                    Enable list all tasks num

                commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f
                Author: kcz358 <[email protected]>
                Date:   Sun Jan 28 09:41:32 2024 +0800

                    Exclude train yaml file in the task list

                commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e
                Author: Zhang Peiyuan <[email protected]>
                Date:   Sun Jan 28 02:04:57 2024 +0800

                    Add InfoVQA, DocVQA, and QwenVL (#28)

                    * add mmme

                    * black

                    * add model specific prompt and gen kwargs

                    * black

                    * add yaml config to supprot multi-model eval

                    * print table at the end

                    * refactor multi model code

                    * add chartqa

                    * black

                    * add ai2d

                    * black

                    * update chartqa

                    * blacl

                    * update ai2d dataset

                    * black

                    * add qwenvl

                    * add infovqa and docvqa

                * Fix error handling in loading YAML config files

                * Squashed commit of the following:

                commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8
                Author: kcz358 <[email protected]>
                Date:   Sun Jan 28 12:41:40 2024 +0800

                    Fix key bugs

                commit eae210c3700a59b7d5cc9de46fcb855f443096aa
                Author: kcz358 <[email protected]>
                Date:   Sun Jan 28 09:46:19 2024 +0800

                    Black lint

                commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae
                Merge: ab898e4 fb209e4
                Author: kcz358 <[email protected]>
                Date:   Sun Jan 28 09:45:31 2024 +0800

                    Merge branch 'main' into kc/list_tasks_num

                commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed
                Author: kcz358 <[email protected]>
                Date:   Sun Jan 28 09:44:23 2024 +0800

                    Enable list all tasks num

                commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f
                Author: kcz358 <[email protected]>
                Date:   Sun Jan 28 09:41:32 2024 +0800

                    Exclude train yaml file in the task list

                commit f92c3d6d10a8b0b7a0b42baa60cb364b99525b4e
                Author: Zhang Peiyuan <[email protected]>
                Date:   Sun Jan 28 02:04:57 2024 +0800

                    Add InfoVQA, DocVQA, and QwenVL (#28)

                    * add mmme

                    * black

                    * add model specific prompt and gen kwargs

                    * black

                    * add yaml config to supprot multi-model eval

                    * print table at the end

                    * refactor multi model code

                    * add chartqa

                    * black

                    * add ai2d

                    * black

                    * update chartqa

                    * blacl

                    * update ai2d dataset

                    * black

                    * add qwenvl

                    * add infovqa and docvqa

                * List task #num sorted

                * Update prompt messages for image-related tasks

                * Delete unused task configuration files

                * Remove coco_train.yaml configuration file

                * Update task name in mmmu.yaml

                * Fix error message for missing tasks

                * Add wandb import and integration

                ---------

                Co…
  • Loading branch information
Luodian and pufanyi authored Feb 5, 2024
1 parent e4f2756 commit 787afd5
Show file tree
Hide file tree
Showing 3 changed files with 8 additions and 10 deletions.
6 changes: 2 additions & 4 deletions lmms_eval/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -171,15 +171,13 @@ def cli_evaluate(args: Union[argparse.Namespace, None] = None) -> None:
else:
is_main_process = False

# run each config
args.is_main_process = is_main_process
for args in args_list:
if args.is_main_process:
if is_main_process:
wandb_logger = WandbLogger(args)
results = cli_evaluate_single(args)

accelerator.wait_for_everyone()
if args.is_main_process:
if is_main_process:
wandb_logger.log_eval_result(results)
wandb_logger.write_to_report(results)
wandb_logger.finish()
Expand Down
2 changes: 1 addition & 1 deletion lmms_eval/logging_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ def __init__(self, args):
def init_run(self):
if "name" not in self.wandb_args:
if "config" in self.all_args_dict and self.all_args_dict["config"] != "":
self.wandb_args["name"] = self.all_args_dict["config"].split("/")[-1].split(".")[0]
self.wandb_args["name"] = self.all_args_dict["config"].split("/")[-1].replace(".yaml", "")
else:
task_names = self.args.tasks.replace(",", "/")
self.wandb_args["name"] = f"{self.args.model}_{task_names}_{self.args.log_samples_suffix}"
Expand Down
10 changes: 5 additions & 5 deletions lmms_eval/models/llava.py
Original file line number Diff line number Diff line change
Expand Up @@ -311,10 +311,10 @@ def _collate(x):
gen_kwargs["num_beams"] = 1

input_ids_list = [tokenizer_image_token(prompt, self.tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt") for prompt in question_input]
input_ids = self.pad_sequence(input_ids_list, batch_first=True, padding_value=self.tokenizer.pad_token_id).to(self.device)
attention_masks = input_ids.ne(self.tokenizer.pad_token_id).to(self.device)
# These steps are not in LLaVA's original code, but are necessary for generation to work
pad_token_ids = self.tokenizer.pad_token_id if self.tokenizer.pad_token_id is not None else self.tokenizer.eos_token_id
input_ids = self.pad_sequence(input_ids_list, batch_first=True, padding_value=pad_token_ids).to(self.device)
attention_masks = input_ids.ne(pad_token_ids).to(self.device)
# These steps are not in LLaVA's original code, but are necessary for generation to work
# TODO: pay attention to this major generation step...
try:
cont = self.model.generate(
Expand All @@ -334,12 +334,12 @@ def _collate(x):
eval_logger.error(f"Error {e} in generating")
cont = ""

cont_toks_list = cont.tolist()
# cont_toks_list = cont.tolist()
# for cont_toks, context in zip(cont_toks_list, contexts):
# discard context + left-padding toks if using causal decoder-only LMM
# if self.truncate_context:
# cont_toks = cont_toks[input_ids.shape[1] :]
text_outputs = self.tokenizer.batch_decode(cont_toks_list, skip_special_tokens=True)
text_outputs = self.tokenizer.batch_decode(cont, skip_special_tokens=True)
# use secondary stop seqs to cut off should-have-been-stopped content post-hoc
# if self.truncate_context:
# for term in until:
Expand Down

0 comments on commit 787afd5

Please sign in to comment.