-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix types to allow nullables in llava_hf.py
#55
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Luodian
approved these changes
Apr 12, 2024
Luodian
added a commit
that referenced
this pull request
Apr 16, 2024
* Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package
Luodian
added a commit
that referenced
this pull request
Apr 16, 2024
* Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package
Luodian
added a commit
that referenced
this pull request
Apr 16, 2024
…d context issue (#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 6b20902 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 21050ba Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit ba0e7f5 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
Luodian
added a commit
that referenced
this pull request
Apr 16, 2024
…d context issue (#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 2a45079 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 7bdab7a Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '90f42f0876a4914c5ac0d213b9dffbfb4797ff62' * Refactor ok_vqa_aggreate_submissions function * Merge commit '4afec3303a0a7ed27a8265565343bf2851b9e4c7' * Refactor VQA submission file saving * Update file utils * Merge commit 'c144b75f0c9145a625b2bbdef5123ed81e343a11' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit d3dfd94 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
Luodian
added a commit
that referenced
this pull request
Apr 16, 2024
* Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 6b20902 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 21050ba Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit ba0e7f5 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
Luodian
added a commit
that referenced
this pull request
Apr 16, 2024
* Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 2a45079 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 7bdab7a Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '90f42f0876a4914c5ac0d213b9dffbfb4797ff62' * Refactor ok_vqa_aggreate_submissions function * Merge commit '4afec3303a0a7ed27a8265565343bf2851b9e4c7' * Refactor VQA submission file saving * Update file utils * Merge commit 'c144b75f0c9145a625b2bbdef5123ed81e343a11' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit d3dfd94 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
Luodian
added a commit
that referenced
this pull request
Apr 16, 2024
) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 6b20902 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 21050ba Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit ba0e7f5 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Add timeout to API requests * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix error logging in get_chat_response function * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Squashed commit of the following: commit faf9cf65cf5b1e036ee3a74428e8bb1490e8b2eb Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit e3729eb925b718a44b6eb225ef9b41c7fd2408e0 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 50b697a7ae93b0547484e1cd753722c1d2513349 Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 17425b5dce41cf67b96c5875139b57d6c7a423df Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit 1bc17d54e79e79d11419ba89e7d8e55bc8cfa21b Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit a20bbc30ab576d3e2a587c70af1b7c06575bcd8b Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit e2b657694b888ef59b9f896415e7c4c82497e7bf Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 6447d521842b9f83f5119cdcd7714c8f6053ca73 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit 8ac333a2e9ebbe6318d536b6589f767f71fbc092 Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 9e542ce049f68f49a237be165e3ad9cde7408ac0 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit f90ccf7b94b130e118b4eca321f68b81e7ab5850 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit f651a77707a4c723ebffb07f2a87743bf42ecea7 Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit a683559c704806b7abde5e4c8355f556f3e65866 Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit 8e246e2466f3dd14a5e34f720269d7991a6dcf6b Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit 67f00dc4652d09c662e5202ff7e5fbf7bebcdaf6 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 53b7a845fe8412a652905101ec036c84e77a20c2 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit 920b4112c4508e9a8afe824678958f2e78189e4e Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit 6b20902 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 21050ba Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit ba0e7f5 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Fix small bugs in list_with_num * Revise list_with_num model args * Dev/readme rm rolling (#60) * remove log_likelyhood_rolling * Update time efficiency benchmark in README.md * add task guide --------- Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove unnecessary code and update dependencies * Fix logging utils bug on wandb grouping * Add reproduce envs * Squashed commit of the following: commit 74fff73053b88a90d0f4229a9c748256080fea08 Merge: 2475639 f89a736 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:12:12 2024 +0800 Merge branch 'main' into kc/final_fix commit 0c640a636e3882859a17e30a5c3504850a3d02d6 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:11:04 2024 +0800 Add reproduce envs commit 7f2b2c3 Author: kcz358 <[email protected]> Date: Sun Mar 3 21:19:15 2024 +0800 [Fix] wandb group logging missing columns (#61) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 6b20902 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 21050ba Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit ba0e7f5 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit bebff9fad2a60bc0ac52ddc430e5d9e4e0ef6c24 Merge: 83358a4 5e1c9c7 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:25:48 2024 +0000 Merge branch 'main' into kc/final_fix commit 5042bb0c2ed4f830dda6bcd14231b1f8763aa95f Author: kcz358 <[email protected]> Date: Sun Mar 3 07:23:19 2024 +0000 Fix logging utils bug on wandb grouping commit c82042b Author: kcz358 <[email protected]> Date: Sun Mar 3 13:01:11 2024 +0800 [Fix] refcocog dataset path, record gpt prompt in internal eval, build context issue (#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 6b20902 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 21050ba Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit ba0e7f5 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit d78a3d7a53f5285a7eac39ce8f04e9854fdb3e73 Author: kcz358 <[email protected]> Date: Sat Mar 2 05:58:08 2024 +0000 Revise list_with_num model args commit 8eefaec8489d48613de9395eb8e8150224985e01 Author: kcz358 <[email protected]> Date: Sat Mar 2 05:09:15 2024 +0000 Fix small bugs in list_with_num commit faf9cf65cf5b1e036ee3a74428e8bb1490e8b2eb Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit e3729eb925b718a44b6eb225ef9b41c7fd2408e0 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 50b697a7ae93b0547484e1cd753722c1d2513349 Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 17425b5dce41cf67b96c5875139b57d6c7a423df Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit 1bc17d54e79e79d11419ba89e7d8e55bc8cfa21b Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit a20bbc30ab576d3e2a587c70af1b7c06575bcd8b Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit e2b657694b888ef59b9f896415e7c4c82497e7bf Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 6447d521842b9f83f5119cdcd7714c8f6053ca73 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit 8ac333a2e9ebbe6318d536b6589f767f71fbc092 Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 9e542ce049f68f49a237be165e3ad9cde7408ac0 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit f90ccf7b94b130e118b4eca321f68b81e7ab5850 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit f651a77707a4c723ebffb07f2a87743bf42ecea7 Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit a683559c704806b7abde5e4c8355f556f3e65866 Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit 8e246e2466f3dd14a5e34f720269d7991a6dcf6b Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit 67f00dc4652d09c662e5202ff7e5fbf7bebcdaf6 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 53b7a845fe8412a652905101ec036c84e77a20c2 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit 920b4112c4508e9a8afe824678958f2e78189e4e Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit 6b20902 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 21050ba Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit ba0e7f5 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update commands.md * Add repr_scripts for reference * Add timeout for gpt4V * Remove unnecessary dependencies * Add reproduce into readme * Revise seedbench process_result * Fix exclude dc hardcode postprocess logic error * Fix metric repeat issue * Update dataset runtime and add environment info * Revise val submission file saving path * Put the correct query into the gpt extraction * Update sleep time in utils.py * update --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
Luodian
added a commit
that referenced
this pull request
Apr 16, 2024
) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 2a45079 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 7bdab7a Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '90f42f0876a4914c5ac0d213b9dffbfb4797ff62' * Refactor ok_vqa_aggreate_submissions function * Merge commit '4afec3303a0a7ed27a8265565343bf2851b9e4c7' * Refactor VQA submission file saving * Update file utils * Merge commit 'c144b75f0c9145a625b2bbdef5123ed81e343a11' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit d3dfd94 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Add timeout to API requests * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix error logging in get_chat_response function * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Squashed commit of the following: commit 2fbeafc882c80242a10381abc67629d5d8b7071a Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit f188052450bed2f3a30ab6f9a6f7eb844a64cb33 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit baef5905505892593fe783beb18a2de20991d6af Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 11b46f3b701b79b361dd5175a263e4d89bd07fb5 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit 0982de2e7a2310429e51ec7828886fd49953f716 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit f840ed80f4ae467fff62b61844854a3a9e8ec8a5 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit 80db78f600d07011188983637c94da84b9475fbf Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 676229de870b8d465cef08867cd272a4b696e630 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit d293b96fb3537fea85f10f216d762abf35e05e8d Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 01bbd010590d6b7f105525580209191a1d6d5232 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 66595ebc073ff9431f2400006196c0645be58ea4 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 08c2ebad1532fd6c34ac04efb94a268db9862d4f Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit aefbd3c6856584135e2dcbe13381db0e0780f063 Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit b9aebc3ff3b122d6d4a81bd2f28e86b2c390c505 Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit c9daa91f2576de69af73c80e263afb085ecd8288 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit b1c4c88b9b36e02e9ed738ff9217d98a5ef2117b Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit b35bc4a6c8fd6b4b2a68bb3054878807b8b92281 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit 2a45079 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 7bdab7a Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '90f42f0876a4914c5ac0d213b9dffbfb4797ff62' * Refactor ok_vqa_aggreate_submissions function * Merge commit '4afec3303a0a7ed27a8265565343bf2851b9e4c7' * Refactor VQA submission file saving * Update file utils * Merge commit 'c144b75f0c9145a625b2bbdef5123ed81e343a11' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit d3dfd94 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Fix small bugs in list_with_num * Revise list_with_num model args * Dev/readme rm rolling (#60) * remove log_likelyhood_rolling * Update time efficiency benchmark in README.md * add task guide --------- Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove unnecessary code and update dependencies * Fix logging utils bug on wandb grouping * Add reproduce envs * Squashed commit of the following: commit 556b12620379d79c9ed5ddba0856063b498f917c Merge: 2475639 f89a736 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:12:12 2024 +0800 Merge branch 'main' into kc/final_fix commit 9509a782c9e9824273cefb1dc9671c92b887697d Author: kcz358 <[email protected]> Date: Sun Mar 3 22:11:04 2024 +0800 Add reproduce envs commit 0bff98b Author: kcz358 <[email protected]> Date: Sun Mar 3 21:19:15 2024 +0800 [Fix] wandb group logging missing columns (#61) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 2a45079 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 7bdab7a Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '90f42f0876a4914c5ac0d213b9dffbfb4797ff62' * Refactor ok_vqa_aggreate_submissions function * Merge commit '4afec3303a0a7ed27a8265565343bf2851b9e4c7' * Refactor VQA submission file saving * Update file utils * Merge commit 'c144b75f0c9145a625b2bbdef5123ed81e343a11' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit d3dfd94 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit 7c4501a32bbb415ba7e62e93194b37ba9a435cf5 Merge: 83358a4 5e1c9c7 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:25:48 2024 +0000 Merge branch 'main' into kc/final_fix commit 5c419f9fa23616a63a0bd584f18e509bb7704b50 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:23:19 2024 +0000 Fix logging utils bug on wandb grouping commit 0010d0a Author: kcz358 <[email protected]> Date: Sun Mar 3 13:01:11 2024 +0800 [Fix] refcocog dataset path, record gpt prompt in internal eval, build context issue (#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 2a45079 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 7bdab7a Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '90f42f0876a4914c5ac0d213b9dffbfb4797ff62' * Refactor ok_vqa_aggreate_submissions function * Merge commit '4afec3303a0a7ed27a8265565343bf2851b9e4c7' * Refactor VQA submission file saving * Update file utils * Merge commit 'c144b75f0c9145a625b2bbdef5123ed81e343a11' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit d3dfd94 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit b2ca65d1f12d84ae7a37ecc81f760901389a1af0 Author: kcz358 <[email protected]> Date: Sat Mar 2 05:58:08 2024 +0000 Revise list_with_num model args commit a262ea1720b2c02839d21dad2a7618bc80725f18 Author: kcz358 <[email protected]> Date: Sat Mar 2 05:09:15 2024 +0000 Fix small bugs in list_with_num commit 2fbeafc882c80242a10381abc67629d5d8b7071a Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit f188052450bed2f3a30ab6f9a6f7eb844a64cb33 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit baef5905505892593fe783beb18a2de20991d6af Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 11b46f3b701b79b361dd5175a263e4d89bd07fb5 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit 0982de2e7a2310429e51ec7828886fd49953f716 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit f840ed80f4ae467fff62b61844854a3a9e8ec8a5 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit 80db78f600d07011188983637c94da84b9475fbf Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 676229de870b8d465cef08867cd272a4b696e630 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit d293b96fb3537fea85f10f216d762abf35e05e8d Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 01bbd010590d6b7f105525580209191a1d6d5232 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 66595ebc073ff9431f2400006196c0645be58ea4 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 08c2ebad1532fd6c34ac04efb94a268db9862d4f Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit aefbd3c6856584135e2dcbe13381db0e0780f063 Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit b9aebc3ff3b122d6d4a81bd2f28e86b2c390c505 Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit c9daa91f2576de69af73c80e263afb085ecd8288 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit b1c4c88b9b36e02e9ed738ff9217d98a5ef2117b Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit b35bc4a6c8fd6b4b2a68bb3054878807b8b92281 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit 2a45079 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 7bdab7a Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '90f42f0876a4914c5ac0d213b9dffbfb4797ff62' * Refactor ok_vqa_aggreate_submissions function * Merge commit '4afec3303a0a7ed27a8265565343bf2851b9e4c7' * Refactor VQA submission file saving * Update file utils * Merge commit 'c144b75f0c9145a625b2bbdef5123ed81e343a11' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit d3dfd94 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update commands.md * Add repr_scripts for reference * Add timeout for gpt4V * Remove unnecessary dependencies * Add reproduce into readme * Revise seedbench process_result * Fix exclude dc hardcode postprocess logic error * Fix metric repeat issue * Update dataset runtime and add environment info * Revise val submission file saving path * Put the correct query into the gpt extraction * Update sleep time in utils.py * update --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
Luodian
added a commit
that referenced
this pull request
Apr 16, 2024
* Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 6b20902 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 21050ba Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit ba0e7f5 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Add timeout to API requests * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix error logging in get_chat_response function * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Squashed commit of the following: commit faf9cf65cf5b1e036ee3a74428e8bb1490e8b2eb Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit e3729eb925b718a44b6eb225ef9b41c7fd2408e0 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 50b697a7ae93b0547484e1cd753722c1d2513349 Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 17425b5dce41cf67b96c5875139b57d6c7a423df Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit 1bc17d54e79e79d11419ba89e7d8e55bc8cfa21b Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit a20bbc30ab576d3e2a587c70af1b7c06575bcd8b Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit e2b657694b888ef59b9f896415e7c4c82497e7bf Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 6447d521842b9f83f5119cdcd7714c8f6053ca73 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit 8ac333a2e9ebbe6318d536b6589f767f71fbc092 Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 9e542ce049f68f49a237be165e3ad9cde7408ac0 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit f90ccf7b94b130e118b4eca321f68b81e7ab5850 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit f651a77707a4c723ebffb07f2a87743bf42ecea7 Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit a683559c704806b7abde5e4c8355f556f3e65866 Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit 8e246e2466f3dd14a5e34f720269d7991a6dcf6b Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit 67f00dc4652d09c662e5202ff7e5fbf7bebcdaf6 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 53b7a845fe8412a652905101ec036c84e77a20c2 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit 920b4112c4508e9a8afe824678958f2e78189e4e Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit 6b20902 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 21050ba Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit ba0e7f5 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Fix small bugs in list_with_num * Revise list_with_num model args * Dev/readme rm rolling (#60) * remove log_likelyhood_rolling * Update time efficiency benchmark in README.md * add task guide --------- Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove unnecessary code and update dependencies * Fix logging utils bug on wandb grouping * Add reproduce envs * Squashed commit of the following: commit 74fff73053b88a90d0f4229a9c748256080fea08 Merge: 2475639 f89a736 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:12:12 2024 +0800 Merge branch 'main' into kc/final_fix commit 0c640a636e3882859a17e30a5c3504850a3d02d6 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:11:04 2024 +0800 Add reproduce envs commit 7f2b2c3 Author: kcz358 <[email protected]> Date: Sun Mar 3 21:19:15 2024 +0800 [Fix] wandb group logging missing columns (#61) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 6b20902 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 21050ba Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit ba0e7f5 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit bebff9fad2a60bc0ac52ddc430e5d9e4e0ef6c24 Merge: 83358a4 5e1c9c7 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:25:48 2024 +0000 Merge branch 'main' into kc/final_fix commit 5042bb0c2ed4f830dda6bcd14231b1f8763aa95f Author: kcz358 <[email protected]> Date: Sun Mar 3 07:23:19 2024 +0000 Fix logging utils bug on wandb grouping commit c82042b Author: kcz358 <[email protected]> Date: Sun Mar 3 13:01:11 2024 +0800 [Fix] refcocog dataset path, record gpt prompt in internal eval, build context issue (#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 6b20902 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 21050ba Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit ba0e7f5 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit d78a3d7a53f5285a7eac39ce8f04e9854fdb3e73 Author: kcz358 <[email protected]> Date: Sat Mar 2 05:58:08 2024 +0000 Revise list_with_num model args commit 8eefaec8489d48613de9395eb8e8150224985e01 Author: kcz358 <[email protected]> Date: Sat Mar 2 05:09:15 2024 +0000 Fix small bugs in list_with_num commit faf9cf65cf5b1e036ee3a74428e8bb1490e8b2eb Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit e3729eb925b718a44b6eb225ef9b41c7fd2408e0 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 50b697a7ae93b0547484e1cd753722c1d2513349 Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 17425b5dce41cf67b96c5875139b57d6c7a423df Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit 1bc17d54e79e79d11419ba89e7d8e55bc8cfa21b Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit a20bbc30ab576d3e2a587c70af1b7c06575bcd8b Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit e2b657694b888ef59b9f896415e7c4c82497e7bf Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 6447d521842b9f83f5119cdcd7714c8f6053ca73 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit 8ac333a2e9ebbe6318d536b6589f767f71fbc092 Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 9e542ce049f68f49a237be165e3ad9cde7408ac0 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit f90ccf7b94b130e118b4eca321f68b81e7ab5850 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit f651a77707a4c723ebffb07f2a87743bf42ecea7 Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit a683559c704806b7abde5e4c8355f556f3e65866 Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit 8e246e2466f3dd14a5e34f720269d7991a6dcf6b Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit 67f00dc4652d09c662e5202ff7e5fbf7bebcdaf6 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 53b7a845fe8412a652905101ec036c84e77a20c2 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit 920b4112c4508e9a8afe824678958f2e78189e4e Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit 6b20902 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 21050ba Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit ba0e7f5 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update commands.md * Add repr_scripts for reference * Add timeout for gpt4V * Remove unnecessary dependencies * Add reproduce into readme * Revise seedbench process_result * Fix exclude dc hardcode postprocess logic error * Fix metric repeat issue * Update dataset runtime and add environment info * Revise val submission file saving path * Put the correct query into the gpt extraction * Update sleep time in utils.py * update --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]>
Luodian
added a commit
that referenced
this pull request
Apr 16, 2024
* Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 2a45079 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 7bdab7a Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '90f42f0876a4914c5ac0d213b9dffbfb4797ff62' * Refactor ok_vqa_aggreate_submissions function * Merge commit '4afec3303a0a7ed27a8265565343bf2851b9e4c7' * Refactor VQA submission file saving * Update file utils * Merge commit 'c144b75f0c9145a625b2bbdef5123ed81e343a11' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit d3dfd94 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Add timeout to API requests * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix error logging in get_chat_response function * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Squashed commit of the following: commit 2fbeafc882c80242a10381abc67629d5d8b7071a Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit f188052450bed2f3a30ab6f9a6f7eb844a64cb33 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit baef5905505892593fe783beb18a2de20991d6af Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 11b46f3b701b79b361dd5175a263e4d89bd07fb5 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit 0982de2e7a2310429e51ec7828886fd49953f716 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit f840ed80f4ae467fff62b61844854a3a9e8ec8a5 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit 80db78f600d07011188983637c94da84b9475fbf Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 676229de870b8d465cef08867cd272a4b696e630 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit d293b96fb3537fea85f10f216d762abf35e05e8d Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 01bbd010590d6b7f105525580209191a1d6d5232 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 66595ebc073ff9431f2400006196c0645be58ea4 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 08c2ebad1532fd6c34ac04efb94a268db9862d4f Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit aefbd3c6856584135e2dcbe13381db0e0780f063 Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit b9aebc3ff3b122d6d4a81bd2f28e86b2c390c505 Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit c9daa91f2576de69af73c80e263afb085ecd8288 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit b1c4c88b9b36e02e9ed738ff9217d98a5ef2117b Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit b35bc4a6c8fd6b4b2a68bb3054878807b8b92281 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit 2a45079 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 7bdab7a Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '90f42f0876a4914c5ac0d213b9dffbfb4797ff62' * Refactor ok_vqa_aggreate_submissions function * Merge commit '4afec3303a0a7ed27a8265565343bf2851b9e4c7' * Refactor VQA submission file saving * Update file utils * Merge commit 'c144b75f0c9145a625b2bbdef5123ed81e343a11' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit d3dfd94 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Fix small bugs in list_with_num * Revise list_with_num model args * Dev/readme rm rolling (#60) * remove log_likelyhood_rolling * Update time efficiency benchmark in README.md * add task guide --------- Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove unnecessary code and update dependencies * Fix logging utils bug on wandb grouping * Add reproduce envs * Squashed commit of the following: commit 556b12620379d79c9ed5ddba0856063b498f917c Merge: 2475639 f89a736 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:12:12 2024 +0800 Merge branch 'main' into kc/final_fix commit 9509a782c9e9824273cefb1dc9671c92b887697d Author: kcz358 <[email protected]> Date: Sun Mar 3 22:11:04 2024 +0800 Add reproduce envs commit 0bff98b Author: kcz358 <[email protected]> Date: Sun Mar 3 21:19:15 2024 +0800 [Fix] wandb group logging missing columns (#61) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 2a45079 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 7bdab7a Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '90f42f0876a4914c5ac0d213b9dffbfb4797ff62' * Refactor ok_vqa_aggreate_submissions function * Merge commit '4afec3303a0a7ed27a8265565343bf2851b9e4c7' * Refactor VQA submission file saving * Update file utils * Merge commit 'c144b75f0c9145a625b2bbdef5123ed81e343a11' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit d3dfd94 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit 7c4501a32bbb415ba7e62e93194b37ba9a435cf5 Merge: 83358a4 5e1c9c7 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:25:48 2024 +0000 Merge branch 'main' into kc/final_fix commit 5c419f9fa23616a63a0bd584f18e509bb7704b50 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:23:19 2024 +0000 Fix logging utils bug on wandb grouping commit 0010d0a Author: kcz358 <[email protected]> Date: Sun Mar 3 13:01:11 2024 +0800 [Fix] refcocog dataset path, record gpt prompt in internal eval, build context issue (#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 2a45079 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 7bdab7a Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '90f42f0876a4914c5ac0d213b9dffbfb4797ff62' * Refactor ok_vqa_aggreate_submissions function * Merge commit '4afec3303a0a7ed27a8265565343bf2851b9e4c7' * Refactor VQA submission file saving * Update file utils * Merge commit 'c144b75f0c9145a625b2bbdef5123ed81e343a11' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit d3dfd94 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit b2ca65d1f12d84ae7a37ecc81f760901389a1af0 Author: kcz358 <[email protected]> Date: Sat Mar 2 05:58:08 2024 +0000 Revise list_with_num model args commit a262ea1720b2c02839d21dad2a7618bc80725f18 Author: kcz358 <[email protected]> Date: Sat Mar 2 05:09:15 2024 +0000 Fix small bugs in list_with_num commit 2fbeafc882c80242a10381abc67629d5d8b7071a Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit f188052450bed2f3a30ab6f9a6f7eb844a64cb33 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit baef5905505892593fe783beb18a2de20991d6af Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 11b46f3b701b79b361dd5175a263e4d89bd07fb5 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit 0982de2e7a2310429e51ec7828886fd49953f716 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit f840ed80f4ae467fff62b61844854a3a9e8ec8a5 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit 80db78f600d07011188983637c94da84b9475fbf Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 676229de870b8d465cef08867cd272a4b696e630 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit d293b96fb3537fea85f10f216d762abf35e05e8d Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 01bbd010590d6b7f105525580209191a1d6d5232 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 66595ebc073ff9431f2400006196c0645be58ea4 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 08c2ebad1532fd6c34ac04efb94a268db9862d4f Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit aefbd3c6856584135e2dcbe13381db0e0780f063 Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit b9aebc3ff3b122d6d4a81bd2f28e86b2c390c505 Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit c9daa91f2576de69af73c80e263afb085ecd8288 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit b1c4c88b9b36e02e9ed738ff9217d98a5ef2117b Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit b35bc4a6c8fd6b4b2a68bb3054878807b8b92281 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit 2a45079 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 7bdab7a Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '90f42f0876a4914c5ac0d213b9dffbfb4797ff62' * Refactor ok_vqa_aggreate_submissions function * Merge commit '4afec3303a0a7ed27a8265565343bf2851b9e4c7' * Refactor VQA submission file saving * Update file utils * Merge commit 'c144b75f0c9145a625b2bbdef5123ed81e343a11' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit d3dfd94 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update commands.md * Add repr_scripts for reference * Add timeout for gpt4V * Remove unnecessary dependencies * Add reproduce into readme * Revise seedbench process_result * Fix exclude dc hardcode postprocess logic error * Fix metric repeat issue * Update dataset runtime and add environment info * Revise val submission file saving path * Put the correct query into the gpt extraction * Update sleep time in utils.py * update --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]>
Luodian
added a commit
that referenced
this pull request
Apr 16, 2024
Fix types to allow nullables in `llava_hf.py`
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
…b#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
…b#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
…b#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
…b#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
…b#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
…b#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
…b#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
…b#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
…d context issue (EvolvingLMMs-Lab#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 5c6e0c8 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 8bd568e Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '63fc8eee4dddfbe741e5a862e5ff30d19c34238e' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'd16bbce134d453c624834e090af1e0f869fdde15' * Refactor VQA submission file saving * Update file utils * Merge commit '7332704263a45ab6fa69aad0c4303cd9cbc26813' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0e0c698 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
…d context issue (EvolvingLMMs-Lab#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 90fbf3d Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0fa3bce Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0182d5d Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
…d context issue (EvolvingLMMs-Lab#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 9c0bc58 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 30ab0ce Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '5e73e8b8a2408bd8193361788669ca80db19cb04' * Refactor ok_vqa_aggreate_submissions function * Merge commit '40099e8b8145bde513b9b7cef8461d8f13d1eafe' * Refactor VQA submission file saving * Update file utils * Merge commit 'a56fe11c00ad4a8b8967be88b93baef6649528c5' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit a5b07ee Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
…d context issue (EvolvingLMMs-Lab#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit b3f1eff Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0f26c8a Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit fefc964 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
…d context issue (EvolvingLMMs-Lab#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit fffe545 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit c608dd6 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'b636596c46dce543cdfacc0809c5b14edafcf1fd' * Refactor ok_vqa_aggreate_submissions function * Merge commit '5624cd5b92ff6b1bc1d431a615d938fd623a03c4' * Refactor VQA submission file saving * Update file utils * Merge commit '034d73b022739333da5e60f432330b8ea832ef9b' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit a0959f1 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
…d context issue (EvolvingLMMs-Lab#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit f6a7654 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 6dbf2a9 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit cbe3e52 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
…d context issue (EvolvingLMMs-Lab#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit a68962a Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0b02105 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '5e73e8b8a2408bd8193361788669ca80db19cb04' * Refactor ok_vqa_aggreate_submissions function * Merge commit '40099e8b8145bde513b9b7cef8461d8f13d1eafe' * Refactor VQA submission file saving * Update file utils * Merge commit 'a56fe11c00ad4a8b8967be88b93baef6649528c5' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit f4af7d0 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
…d context issue (EvolvingLMMs-Lab#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 2782eb0 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 7e8d3e4 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 4fa73ba Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
* Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 5c6e0c8 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 8bd568e Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '63fc8eee4dddfbe741e5a862e5ff30d19c34238e' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'd16bbce134d453c624834e090af1e0f869fdde15' * Refactor VQA submission file saving * Update file utils * Merge commit '7332704263a45ab6fa69aad0c4303cd9cbc26813' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0e0c698 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
* Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 90fbf3d Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0fa3bce Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0182d5d Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
* Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 9c0bc58 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 30ab0ce Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '5e73e8b8a2408bd8193361788669ca80db19cb04' * Refactor ok_vqa_aggreate_submissions function * Merge commit '40099e8b8145bde513b9b7cef8461d8f13d1eafe' * Refactor VQA submission file saving * Update file utils * Merge commit 'a56fe11c00ad4a8b8967be88b93baef6649528c5' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit a5b07ee Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
* Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit b3f1eff Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0f26c8a Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit fefc964 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
* Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit fffe545 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit c608dd6 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'b636596c46dce543cdfacc0809c5b14edafcf1fd' * Refactor ok_vqa_aggreate_submissions function * Merge commit '5624cd5b92ff6b1bc1d431a615d938fd623a03c4' * Refactor VQA submission file saving * Update file utils * Merge commit '034d73b022739333da5e60f432330b8ea832ef9b' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit a0959f1 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
* Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit f6a7654 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 6dbf2a9 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit cbe3e52 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
* Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit a68962a Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0b02105 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '5e73e8b8a2408bd8193361788669ca80db19cb04' * Refactor ok_vqa_aggreate_submissions function * Merge commit '40099e8b8145bde513b9b7cef8461d8f13d1eafe' * Refactor VQA submission file saving * Update file utils * Merge commit 'a56fe11c00ad4a8b8967be88b93baef6649528c5' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit f4af7d0 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
* Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 2782eb0 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 7e8d3e4 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 4fa73ba Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
…volvingLMMs-Lab#62) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 5c6e0c8 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 8bd568e Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '63fc8eee4dddfbe741e5a862e5ff30d19c34238e' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'd16bbce134d453c624834e090af1e0f869fdde15' * Refactor VQA submission file saving * Update file utils * Merge commit '7332704263a45ab6fa69aad0c4303cd9cbc26813' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0e0c698 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Add timeout to API requests * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix error logging in get_chat_response function * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Squashed commit of the following: commit 584db7fcc0140dd4a6d6481529ae90570b2912c4 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 5e52a8df3785eb2d1b392eb164b66e92c9dadb02 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit a3cae8e9f3570121d51885c71f7081da36c5d13d Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 0b3cad596fd58e6414ea015e79bef1eea6eb7f7a Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit f436cb65bd716d93044516ece2133ab5b8d87137 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit 3d47b59f92cef22cfe38e00b407ce38a61d538b2 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit ffb9eb26dae25cda1e0d3e302852862102b47054 Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 1700786b572cbedcb6969ae97028225d388987bb Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit 786f2b53d57265b9900b0718d27538221b5f81b4 Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 888c1c128319bd04528727a309d0d92aaee9e752 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 8c74caa2f77940c781501b45571d7c6362c9a6c8 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 4ab5cc32e3a460ad112dcd3031cea55b6bc0f691 Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit eae08b536908875eeb600538e853caaa14c655ae Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit 4240785c1bf3a7fd15f36013803c004542a17f2e Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit c15bf75d2f76e215b4d5de43c1d17b5a41d79753 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 93534dc4e98b78b9da01099079187d8960705fb8 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit 05166a14c45063bf108282c3202d32feb2fe0afa Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit 5c6e0c8 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 8bd568e Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '63fc8eee4dddfbe741e5a862e5ff30d19c34238e' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'd16bbce134d453c624834e090af1e0f869fdde15' * Refactor VQA submission file saving * Update file utils * Merge commit '7332704263a45ab6fa69aad0c4303cd9cbc26813' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0e0c698 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Fix small bugs in list_with_num * Revise list_with_num model args * Dev/readme rm rolling (EvolvingLMMs-Lab#60) * remove log_likelyhood_rolling * Update time efficiency benchmark in README.md * add task guide --------- Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove unnecessary code and update dependencies * Fix logging utils bug on wandb grouping * Add reproduce envs * Squashed commit of the following: commit 172a002845728f263a9221206aeab62bdc1070dc Merge: 2475639 2152f18 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:12:12 2024 +0800 Merge branch 'main' into kc/final_fix commit 2475639fcf9164a7965b080c31dc50bc856fa053 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:11:04 2024 +0800 Add reproduce envs commit 2152f18 Author: kcz358 <[email protected]> Date: Sun Mar 3 21:19:15 2024 +0800 [Fix] wandb group logging missing columns (EvolvingLMMs-Lab#61) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 5c6e0c8 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 8bd568e Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '63fc8eee4dddfbe741e5a862e5ff30d19c34238e' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'd16bbce134d453c624834e090af1e0f869fdde15' * Refactor VQA submission file saving * Update file utils * Merge commit '7332704263a45ab6fa69aad0c4303cd9cbc26813' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0e0c698 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit 5902608191d5a8a059c2a267afc0100f47140fae Merge: 83358a4 fd7773d Author: kcz358 <[email protected]> Date: Sun Mar 3 07:25:48 2024 +0000 Merge branch 'main' into kc/final_fix commit 83358a42354d8ec57d3d887e2262f82e7dd4c532 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:23:19 2024 +0000 Fix logging utils bug on wandb grouping commit fd7773d Author: kcz358 <[email protected]> Date: Sun Mar 3 13:01:11 2024 +0800 [Fix] refcocog dataset path, record gpt prompt in internal eval, build context issue (EvolvingLMMs-Lab#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 5c6e0c8 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 8bd568e Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '63fc8eee4dddfbe741e5a862e5ff30d19c34238e' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'd16bbce134d453c624834e090af1e0f869fdde15' * Refactor VQA submission file saving * Update file utils * Merge commit '7332704263a45ab6fa69aad0c4303cd9cbc26813' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0e0c698 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit ce51924783fa5c50f99815a33988476ee1220bac Author: kcz358 <[email protected]> Date: Sat Mar 2 05:58:08 2024 +0000 Revise list_with_num model args commit a288035b48620b827a82c1c45412fe2bb3c18715 Author: kcz358 <[email protected]> Date: Sat Mar 2 05:09:15 2024 +0000 Fix small bugs in list_with_num commit 584db7fcc0140dd4a6d6481529ae90570b2912c4 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 5e52a8df3785eb2d1b392eb164b66e92c9dadb02 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit a3cae8e9f3570121d51885c71f7081da36c5d13d Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 0b3cad596fd58e6414ea015e79bef1eea6eb7f7a Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit f436cb65bd716d93044516ece2133ab5b8d87137 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit 3d47b59f92cef22cfe38e00b407ce38a61d538b2 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit ffb9eb26dae25cda1e0d3e302852862102b47054 Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 1700786b572cbedcb6969ae97028225d388987bb Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit 786f2b53d57265b9900b0718d27538221b5f81b4 Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 888c1c128319bd04528727a309d0d92aaee9e752 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 8c74caa2f77940c781501b45571d7c6362c9a6c8 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 4ab5cc32e3a460ad112dcd3031cea55b6bc0f691 Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit eae08b536908875eeb600538e853caaa14c655ae Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit 4240785c1bf3a7fd15f36013803c004542a17f2e Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit c15bf75d2f76e215b4d5de43c1d17b5a41d79753 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 93534dc4e98b78b9da01099079187d8960705fb8 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit 05166a14c45063bf108282c3202d32feb2fe0afa Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit 5c6e0c8 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 8bd568e Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '63fc8eee4dddfbe741e5a862e5ff30d19c34238e' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'd16bbce134d453c624834e090af1e0f869fdde15' * Refactor VQA submission file saving * Update file utils * Merge commit '7332704263a45ab6fa69aad0c4303cd9cbc26813' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0e0c698 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update commands.md * Add repr_scripts for reference * Add timeout for gpt4V * Remove unnecessary dependencies * Add reproduce into readme * Revise seedbench process_result * Fix exclude dc hardcode postprocess logic error * Fix metric repeat issue * Update dataset runtime and add environment info * Revise val submission file saving path * Put the correct query into the gpt extraction * Update sleep time in utils.py * update --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
…volvingLMMs-Lab#62) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 90fbf3d Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0fa3bce Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0182d5d Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Add timeout to API requests * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix error logging in get_chat_response function * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Squashed commit of the following: commit 74a747ff5e5a82cd8f61fb9f5a5357b67c867153 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 336de4a8408ece3c0a2b7b5880c00b38015674a1 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 5860f00373890a18ed09870757bcdae9f3821aa1 Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 912b73ed809e9242351874ce5b127c218188196d Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit f3f98531fc18a053b1a1bdec6c03757e1334e93b Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit ceccc944119c22177e7fe040ba73e468dcf6d419 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit d970b68e39068deb8308bb20af4266f4d37403df Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit f0b9201adeb8e2e78886a6746ead6b585430f7d8 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit f9cdc0331bf9ef3f1cca4a3791658b2f31f300ca Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit fb4bb090b185f18b8be4ef3353ec659a40e1b508 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 3d58243e32f551f5427950663157c2a5ce539504 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 95717b7ce70d40bc12e0b3b5809a686a083903aa Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 07915d5ec5d68ed0cde34bbb6e0c1438757fab72 Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit cc8ce2e48c31c5196ad5e0bca871acbe0c7492a1 Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit 562bb6c15876164ad49392df1a66ed6af84cac76 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit f2a585a4e5163b51dc31686a32a8aae7fd8e0751 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit e3896d1421b5ba5794db227648ca4316a0170569 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit 90fbf3d Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0fa3bce Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0182d5d Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Fix small bugs in list_with_num * Revise list_with_num model args * Dev/readme rm rolling (EvolvingLMMs-Lab#60) * remove log_likelyhood_rolling * Update time efficiency benchmark in README.md * add task guide --------- Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove unnecessary code and update dependencies * Fix logging utils bug on wandb grouping * Add reproduce envs * Squashed commit of the following: commit d1d4ca79d569d5765080160bd8c7e8d432cadd99 Merge: 2475639 f89a736 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:12:12 2024 +0800 Merge branch 'main' into kc/final_fix commit d1815c3465e43a083ab811e8fc8602911a971413 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:11:04 2024 +0800 Add reproduce envs commit b8b7f79 Author: kcz358 <[email protected]> Date: Sun Mar 3 21:19:15 2024 +0800 [Fix] wandb group logging missing columns (EvolvingLMMs-Lab#61) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 90fbf3d Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0fa3bce Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0182d5d Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit bf67bcc02cb57e63952e4429515269458084ea5f Merge: 83358a4 5e1c9c7 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:25:48 2024 +0000 Merge branch 'main' into kc/final_fix commit c3e54461dd77f62aa50bcee8fbbebc14e4470644 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:23:19 2024 +0000 Fix logging utils bug on wandb grouping commit 09eecf5 Author: kcz358 <[email protected]> Date: Sun Mar 3 13:01:11 2024 +0800 [Fix] refcocog dataset path, record gpt prompt in internal eval, build context issue (EvolvingLMMs-Lab#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 90fbf3d Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0fa3bce Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0182d5d Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit a0ce88c84a9122b793a6b6d352896767fed1f18a Author: kcz358 <[email protected]> Date: Sat Mar 2 05:58:08 2024 +0000 Revise list_with_num model args commit b892d8eac7f656fafa5d6425b94b3d089e4a5268 Author: kcz358 <[email protected]> Date: Sat Mar 2 05:09:15 2024 +0000 Fix small bugs in list_with_num commit 74a747ff5e5a82cd8f61fb9f5a5357b67c867153 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 336de4a8408ece3c0a2b7b5880c00b38015674a1 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 5860f00373890a18ed09870757bcdae9f3821aa1 Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 912b73ed809e9242351874ce5b127c218188196d Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit f3f98531fc18a053b1a1bdec6c03757e1334e93b Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit ceccc944119c22177e7fe040ba73e468dcf6d419 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit d970b68e39068deb8308bb20af4266f4d37403df Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit f0b9201adeb8e2e78886a6746ead6b585430f7d8 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit f9cdc0331bf9ef3f1cca4a3791658b2f31f300ca Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit fb4bb090b185f18b8be4ef3353ec659a40e1b508 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 3d58243e32f551f5427950663157c2a5ce539504 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 95717b7ce70d40bc12e0b3b5809a686a083903aa Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 07915d5ec5d68ed0cde34bbb6e0c1438757fab72 Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit cc8ce2e48c31c5196ad5e0bca871acbe0c7492a1 Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit 562bb6c15876164ad49392df1a66ed6af84cac76 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit f2a585a4e5163b51dc31686a32a8aae7fd8e0751 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit e3896d1421b5ba5794db227648ca4316a0170569 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit 90fbf3d Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0fa3bce Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0182d5d Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update commands.md * Add repr_scripts for reference * Add timeout for gpt4V * Remove unnecessary dependencies * Add reproduce into readme * Revise seedbench process_result * Fix exclude dc hardcode postprocess logic error * Fix metric repeat issue * Update dataset runtime and add environment info * Revise val submission file saving path * Put the correct query into the gpt extraction * Update sleep time in utils.py * update --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
…volvingLMMs-Lab#62) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 9c0bc58 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 30ab0ce Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '5e73e8b8a2408bd8193361788669ca80db19cb04' * Refactor ok_vqa_aggreate_submissions function * Merge commit '40099e8b8145bde513b9b7cef8461d8f13d1eafe' * Refactor VQA submission file saving * Update file utils * Merge commit 'a56fe11c00ad4a8b8967be88b93baef6649528c5' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit a5b07ee Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Add timeout to API requests * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix error logging in get_chat_response function * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Squashed commit of the following: commit 1cf38b3ad6c7799957901d836299243cc21718f5 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 62527c874431508b7731ad49ff1f1526104703cd Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 522f36aca8354f5efa7fff6d23bd90e885bcf1ab Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 4ee323a5b19382dbd9ba62f5002042d0746c374e Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit 3d3e164489cb4bd2db342ae085da9613ee7de660 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit 8a4f586d7232a4d89977cef140900728d4517b72 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit 33dd5b0e0006882e735b7ea1908fdb6ad37c825a Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit f19de3e7aaf5151d5ce9c63a2b9ee393c6282dfa Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit e1f8cad15ddc2e385a3f2a778a4af57e1072987c Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 472b6b1ed2d5bc10ff1d6df8e435f33dc821ad4b Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 367c021bd50068baf024bea3afde4ed58aa38b81 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 0a466e16d983392cbf0580733500c0890521df93 Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 6feceda2c1d631243c78fd7805dcdde4d0e8912f Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit db1f731ee5aff4618edefed018e982f83add0c9a Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit c8a5e1129310ed1ce1fd86f43bb49da701140383 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit de53ceaeff08dc7c01962c704e06d7b87f804ec7 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit e372631e911f2e03cc4f579e291e1198c4c11298 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit 9c0bc58 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 30ab0ce Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '5e73e8b8a2408bd8193361788669ca80db19cb04' * Refactor ok_vqa_aggreate_submissions function * Merge commit '40099e8b8145bde513b9b7cef8461d8f13d1eafe' * Refactor VQA submission file saving * Update file utils * Merge commit 'a56fe11c00ad4a8b8967be88b93baef6649528c5' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit a5b07ee Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Fix small bugs in list_with_num * Revise list_with_num model args * Dev/readme rm rolling (EvolvingLMMs-Lab#60) * remove log_likelyhood_rolling * Update time efficiency benchmark in README.md * add task guide --------- Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove unnecessary code and update dependencies * Fix logging utils bug on wandb grouping * Add reproduce envs * Squashed commit of the following: commit cf18d7a1300311ffe1c9671fff7fa0c0d1cf2476 Merge: 2475639 f89a736 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:12:12 2024 +0800 Merge branch 'main' into kc/final_fix commit 35e5a937bcf924d6b787ce37c6da9f0f54674da9 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:11:04 2024 +0800 Add reproduce envs commit 13179f9 Author: kcz358 <[email protected]> Date: Sun Mar 3 21:19:15 2024 +0800 [Fix] wandb group logging missing columns (EvolvingLMMs-Lab#61) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 9c0bc58 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 30ab0ce Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '5e73e8b8a2408bd8193361788669ca80db19cb04' * Refactor ok_vqa_aggreate_submissions function * Merge commit '40099e8b8145bde513b9b7cef8461d8f13d1eafe' * Refactor VQA submission file saving * Update file utils * Merge commit 'a56fe11c00ad4a8b8967be88b93baef6649528c5' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit a5b07ee Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit 39ce670fb1992c5e30d4b0eff9636a88a1ce83f5 Merge: 83358a4 5e1c9c7 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:25:48 2024 +0000 Merge branch 'main' into kc/final_fix commit 36eeaa08730cd3e6a7e90e7000f61b4ebb075524 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:23:19 2024 +0000 Fix logging utils bug on wandb grouping commit 9ac7212 Author: kcz358 <[email protected]> Date: Sun Mar 3 13:01:11 2024 +0800 [Fix] refcocog dataset path, record gpt prompt in internal eval, build context issue (EvolvingLMMs-Lab#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 9c0bc58 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 30ab0ce Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '5e73e8b8a2408bd8193361788669ca80db19cb04' * Refactor ok_vqa_aggreate_submissions function * Merge commit '40099e8b8145bde513b9b7cef8461d8f13d1eafe' * Refactor VQA submission file saving * Update file utils * Merge commit 'a56fe11c00ad4a8b8967be88b93baef6649528c5' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit a5b07ee Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit 22fda28d8aa2a53405f15d179ea9baaf53a19b0b Author: kcz358 <[email protected]> Date: Sat Mar 2 05:58:08 2024 +0000 Revise list_with_num model args commit 48d92eb823b7929ea4c7b0da9f2284ec194c71cf Author: kcz358 <[email protected]> Date: Sat Mar 2 05:09:15 2024 +0000 Fix small bugs in list_with_num commit 1cf38b3ad6c7799957901d836299243cc21718f5 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 62527c874431508b7731ad49ff1f1526104703cd Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 522f36aca8354f5efa7fff6d23bd90e885bcf1ab Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 4ee323a5b19382dbd9ba62f5002042d0746c374e Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit 3d3e164489cb4bd2db342ae085da9613ee7de660 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit 8a4f586d7232a4d89977cef140900728d4517b72 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit 33dd5b0e0006882e735b7ea1908fdb6ad37c825a Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit f19de3e7aaf5151d5ce9c63a2b9ee393c6282dfa Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit e1f8cad15ddc2e385a3f2a778a4af57e1072987c Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 472b6b1ed2d5bc10ff1d6df8e435f33dc821ad4b Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 367c021bd50068baf024bea3afde4ed58aa38b81 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 0a466e16d983392cbf0580733500c0890521df93 Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 6feceda2c1d631243c78fd7805dcdde4d0e8912f Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit db1f731ee5aff4618edefed018e982f83add0c9a Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit c8a5e1129310ed1ce1fd86f43bb49da701140383 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit de53ceaeff08dc7c01962c704e06d7b87f804ec7 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit e372631e911f2e03cc4f579e291e1198c4c11298 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit 9c0bc58 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 30ab0ce Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '5e73e8b8a2408bd8193361788669ca80db19cb04' * Refactor ok_vqa_aggreate_submissions function * Merge commit '40099e8b8145bde513b9b7cef8461d8f13d1eafe' * Refactor VQA submission file saving * Update file utils * Merge commit 'a56fe11c00ad4a8b8967be88b93baef6649528c5' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit a5b07ee Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update commands.md * Add repr_scripts for reference * Add timeout for gpt4V * Remove unnecessary dependencies * Add reproduce into readme * Revise seedbench process_result * Fix exclude dc hardcode postprocess logic error * Fix metric repeat issue * Update dataset runtime and add environment info * Revise val submission file saving path * Put the correct query into the gpt extraction * Update sleep time in utils.py * update --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
…volvingLMMs-Lab#62) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit b3f1eff Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0f26c8a Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit fefc964 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Add timeout to API requests * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix error logging in get_chat_response function * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Squashed commit of the following: commit 6bb0667ea746cc1dfa9442882f517edd47694d3e Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit e4ab9fc9ec7d77850ecc05bd33256909cdf62513 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 74c28de92a5794054d7c937b727fba3a8e5821c3 Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 279be1be1e2a839c97e58289362d6828e95e064a Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit 666f3146feef55f898f710254824d4b2c57e6747 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit 1f8d04d20feb6363615537ab47f8a1241c4ee692 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit 985194e49f519ce04bdc2c0ce00eee3ab6c02def Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit ef5a0a3b46acc36255c28781d8d66fc9bd32d47b Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit e793fd1da7416d7938a6f9e98728692c04264a97 Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit c1ae0a853bfdcc7d59e3d9fa0eaa78d4d1f01336 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 3ca0112d74b957f4d4ca20be5573deb8141793c7 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 821398fde93ccd52eac2f4bbfb8c2e787a10b987 Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 5172c13fb3b212c0d175987727433320a1faacbc Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit 12a243c8bee0be6ffacf17e46143519734c310d5 Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit 2aded15347d10078c49606b690d05935ad29e6d1 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 9d499f198a9bdab2177bedfd3980c00934c684ff Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit c5431b5b80cbaf6e11d840ecb1d0734d680ac41b Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit b3f1eff Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0f26c8a Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit fefc964 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Fix small bugs in list_with_num * Revise list_with_num model args * Dev/readme rm rolling (EvolvingLMMs-Lab#60) * remove log_likelyhood_rolling * Update time efficiency benchmark in README.md * add task guide --------- Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove unnecessary code and update dependencies * Fix logging utils bug on wandb grouping * Add reproduce envs * Squashed commit of the following: commit 084a21394643acd741fe0969dd0d3f6c6c734853 Merge: 2475639 f89a736 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:12:12 2024 +0800 Merge branch 'main' into kc/final_fix commit 803d0aec82a57de2ddf1527044f14ed968c30e25 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:11:04 2024 +0800 Add reproduce envs commit c5344f6 Author: kcz358 <[email protected]> Date: Sun Mar 3 21:19:15 2024 +0800 [Fix] wandb group logging missing columns (EvolvingLMMs-Lab#61) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit b3f1eff Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0f26c8a Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit fefc964 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit 7009af6bc533534e249b3070f122d825ce738ba0 Merge: 83358a4 5e1c9c7 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:25:48 2024 +0000 Merge branch 'main' into kc/final_fix commit 44b1e7fc5570130e64269c312c11fe0244c72c87 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:23:19 2024 +0000 Fix logging utils bug on wandb grouping commit 34476c7 Author: kcz358 <[email protected]> Date: Sun Mar 3 13:01:11 2024 +0800 [Fix] refcocog dataset path, record gpt prompt in internal eval, build context issue (EvolvingLMMs-Lab#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit b3f1eff Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0f26c8a Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit fefc964 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit be339f832f760190e81bbfbeffb7049f7cccee60 Author: kcz358 <[email protected]> Date: Sat Mar 2 05:58:08 2024 +0000 Revise list_with_num model args commit f301a5614054538cd7c18d3ac7b1f02305e68224 Author: kcz358 <[email protected]> Date: Sat Mar 2 05:09:15 2024 +0000 Fix small bugs in list_with_num commit 6bb0667ea746cc1dfa9442882f517edd47694d3e Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit e4ab9fc9ec7d77850ecc05bd33256909cdf62513 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 74c28de92a5794054d7c937b727fba3a8e5821c3 Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 279be1be1e2a839c97e58289362d6828e95e064a Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit 666f3146feef55f898f710254824d4b2c57e6747 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit 1f8d04d20feb6363615537ab47f8a1241c4ee692 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit 985194e49f519ce04bdc2c0ce00eee3ab6c02def Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit ef5a0a3b46acc36255c28781d8d66fc9bd32d47b Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit e793fd1da7416d7938a6f9e98728692c04264a97 Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit c1ae0a853bfdcc7d59e3d9fa0eaa78d4d1f01336 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 3ca0112d74b957f4d4ca20be5573deb8141793c7 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 821398fde93ccd52eac2f4bbfb8c2e787a10b987 Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 5172c13fb3b212c0d175987727433320a1faacbc Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit 12a243c8bee0be6ffacf17e46143519734c310d5 Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit 2aded15347d10078c49606b690d05935ad29e6d1 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 9d499f198a9bdab2177bedfd3980c00934c684ff Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit c5431b5b80cbaf6e11d840ecb1d0734d680ac41b Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit b3f1eff Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0f26c8a Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit fefc964 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update commands.md * Add repr_scripts for reference * Add timeout for gpt4V * Remove unnecessary dependencies * Add reproduce into readme * Revise seedbench process_result * Fix exclude dc hardcode postprocess logic error * Fix metric repeat issue * Update dataset runtime and add environment info * Revise val submission file saving path * Put the correct query into the gpt extraction * Update sleep time in utils.py * update --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
…volvingLMMs-Lab#62) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit fffe545 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit c608dd6 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'b636596c46dce543cdfacc0809c5b14edafcf1fd' * Refactor ok_vqa_aggreate_submissions function * Merge commit '5624cd5b92ff6b1bc1d431a615d938fd623a03c4' * Refactor VQA submission file saving * Update file utils * Merge commit '034d73b022739333da5e60f432330b8ea832ef9b' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit a0959f1 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Add timeout to API requests * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix error logging in get_chat_response function * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Squashed commit of the following: commit decb360fd834d968cc59dee6a06d40a326177ec5 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 1e1ecf0de94b5e493ce0590269b3a2b9d030e31d Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit ade2b08994f0b92f20d373cbc3cc8e2a8b665f49 Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 3bca65bca4b9b4cab80d50172dabda5c549c539f Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit 0ee12be56664eac6a79599b48ea22985f18ec358 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit 62cb1058ac416027ad981e3ba31ce029dfe83cf3 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit 55447e7039321ed8d46c8dccaf75113288bdb502 Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 26963ddb5315e39ad9142e0fa1391fe2b8201c54 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit 7abbd695dfa73e09687a4d4f73c6bc99e63c811a Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit d1b19061661c1da1d3b7e9cd5d126ec475b6e1de Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 2564e74c7e8c07a51200560be70d2be13501fd9a Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 4885702fcd36cfaf5bf2e498621fa0a831e8617c Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 46fc13424e6fecaa15d290f2330bc440ce9bd6e6 Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit 7cbeb3a05fc13fa9d0d44a17a7cd25e7550c435b Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit e8a88505cbd71029682eaaddc8fe2c5cd41ccf5d Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit efc341983b959fb2cc9cc208879a86a01c251494 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit 2b92f718f478b9f7999b17560439db366d2165a3 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit fffe545 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit c608dd6 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'b636596c46dce543cdfacc0809c5b14edafcf1fd' * Refactor ok_vqa_aggreate_submissions function * Merge commit '5624cd5b92ff6b1bc1d431a615d938fd623a03c4' * Refactor VQA submission file saving * Update file utils * Merge commit '034d73b022739333da5e60f432330b8ea832ef9b' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit a0959f1 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Fix small bugs in list_with_num * Revise list_with_num model args * Dev/readme rm rolling (EvolvingLMMs-Lab#60) * remove log_likelyhood_rolling * Update time efficiency benchmark in README.md * add task guide --------- Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove unnecessary code and update dependencies * Fix logging utils bug on wandb grouping * Add reproduce envs * Squashed commit of the following: commit 4a1f385be0df3374ebf428599cfe35febdae0582 Merge: 2475639 f89a736 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:12:12 2024 +0800 Merge branch 'main' into kc/final_fix commit 19f7d8cd771fddd6cc6c3fee8f3c51fa4ad83eaa Author: kcz358 <[email protected]> Date: Sun Mar 3 22:11:04 2024 +0800 Add reproduce envs commit 1b605af Author: kcz358 <[email protected]> Date: Sun Mar 3 21:19:15 2024 +0800 [Fix] wandb group logging missing columns (EvolvingLMMs-Lab#61) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit fffe545 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit c608dd6 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'b636596c46dce543cdfacc0809c5b14edafcf1fd' * Refactor ok_vqa_aggreate_submissions function * Merge commit '5624cd5b92ff6b1bc1d431a615d938fd623a03c4' * Refactor VQA submission file saving * Update file utils * Merge commit '034d73b022739333da5e60f432330b8ea832ef9b' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit a0959f1 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit d1fffce8c61bd7e1e32f76c953c5b26773be58d5 Merge: 83358a4 5e1c9c7 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:25:48 2024 +0000 Merge branch 'main' into kc/final_fix commit 5a4df5d39e813844002af1a02ef4ce0c69feaa6d Author: kcz358 <[email protected]> Date: Sun Mar 3 07:23:19 2024 +0000 Fix logging utils bug on wandb grouping commit b923ad1 Author: kcz358 <[email protected]> Date: Sun Mar 3 13:01:11 2024 +0800 [Fix] refcocog dataset path, record gpt prompt in internal eval, build context issue (EvolvingLMMs-Lab#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit fffe545 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit c608dd6 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'b636596c46dce543cdfacc0809c5b14edafcf1fd' * Refactor ok_vqa_aggreate_submissions function * Merge commit '5624cd5b92ff6b1bc1d431a615d938fd623a03c4' * Refactor VQA submission file saving * Update file utils * Merge commit '034d73b022739333da5e60f432330b8ea832ef9b' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit a0959f1 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit 7f852ee91653357b6ee954ec92bcf2e5bab4bbcf Author: kcz358 <[email protected]> Date: Sat Mar 2 05:58:08 2024 +0000 Revise list_with_num model args commit 79c737c915565b191ab29113c98615a1c6acc994 Author: kcz358 <[email protected]> Date: Sat Mar 2 05:09:15 2024 +0000 Fix small bugs in list_with_num commit decb360fd834d968cc59dee6a06d40a326177ec5 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 1e1ecf0de94b5e493ce0590269b3a2b9d030e31d Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit ade2b08994f0b92f20d373cbc3cc8e2a8b665f49 Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 3bca65bca4b9b4cab80d50172dabda5c549c539f Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit 0ee12be56664eac6a79599b48ea22985f18ec358 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit 62cb1058ac416027ad981e3ba31ce029dfe83cf3 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit 55447e7039321ed8d46c8dccaf75113288bdb502 Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 26963ddb5315e39ad9142e0fa1391fe2b8201c54 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit 7abbd695dfa73e09687a4d4f73c6bc99e63c811a Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit d1b19061661c1da1d3b7e9cd5d126ec475b6e1de Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 2564e74c7e8c07a51200560be70d2be13501fd9a Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 4885702fcd36cfaf5bf2e498621fa0a831e8617c Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 46fc13424e6fecaa15d290f2330bc440ce9bd6e6 Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit 7cbeb3a05fc13fa9d0d44a17a7cd25e7550c435b Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit e8a88505cbd71029682eaaddc8fe2c5cd41ccf5d Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit efc341983b959fb2cc9cc208879a86a01c251494 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit 2b92f718f478b9f7999b17560439db366d2165a3 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit fffe545 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit c608dd6 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'b636596c46dce543cdfacc0809c5b14edafcf1fd' * Refactor ok_vqa_aggreate_submissions function * Merge commit '5624cd5b92ff6b1bc1d431a615d938fd623a03c4' * Refactor VQA submission file saving * Update file utils * Merge commit '034d73b022739333da5e60f432330b8ea832ef9b' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit a0959f1 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update commands.md * Add repr_scripts for reference * Add timeout for gpt4V * Remove unnecessary dependencies * Add reproduce into readme * Revise seedbench process_result * Fix exclude dc hardcode postprocess logic error * Fix metric repeat issue * Update dataset runtime and add environment info * Revise val submission file saving path * Put the correct query into the gpt extraction * Update sleep time in utils.py * update --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
…volvingLMMs-Lab#62) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit f6a7654 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 6dbf2a9 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit cbe3e52 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Add timeout to API requests * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix error logging in get_chat_response function * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Squashed commit of the following: commit 74a747ff5e5a82cd8f61fb9f5a5357b67c867153 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 336de4a8408ece3c0a2b7b5880c00b38015674a1 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 5860f00373890a18ed09870757bcdae9f3821aa1 Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 912b73ed809e9242351874ce5b127c218188196d Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit f3f98531fc18a053b1a1bdec6c03757e1334e93b Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit ceccc944119c22177e7fe040ba73e468dcf6d419 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit d970b68e39068deb8308bb20af4266f4d37403df Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit f0b9201adeb8e2e78886a6746ead6b585430f7d8 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit f9cdc0331bf9ef3f1cca4a3791658b2f31f300ca Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit fb4bb090b185f18b8be4ef3353ec659a40e1b508 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 3d58243e32f551f5427950663157c2a5ce539504 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 95717b7ce70d40bc12e0b3b5809a686a083903aa Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 07915d5ec5d68ed0cde34bbb6e0c1438757fab72 Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit cc8ce2e48c31c5196ad5e0bca871acbe0c7492a1 Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit 562bb6c15876164ad49392df1a66ed6af84cac76 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit f2a585a4e5163b51dc31686a32a8aae7fd8e0751 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit e3896d1421b5ba5794db227648ca4316a0170569 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit f6a7654 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 6dbf2a9 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit cbe3e52 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Fix small bugs in list_with_num * Revise list_with_num model args * Dev/readme rm rolling (EvolvingLMMs-Lab#60) * remove log_likelyhood_rolling * Update time efficiency benchmark in README.md * add task guide --------- Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove unnecessary code and update dependencies * Fix logging utils bug on wandb grouping * Add reproduce envs * Squashed commit of the following: commit d1d4ca79d569d5765080160bd8c7e8d432cadd99 Merge: 2475639 f89a736 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:12:12 2024 +0800 Merge branch 'main' into kc/final_fix commit d1815c3465e43a083ab811e8fc8602911a971413 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:11:04 2024 +0800 Add reproduce envs commit 27dbf48 Author: kcz358 <[email protected]> Date: Sun Mar 3 21:19:15 2024 +0800 [Fix] wandb group logging missing columns (EvolvingLMMs-Lab#61) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit f6a7654 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 6dbf2a9 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit cbe3e52 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit bf67bcc02cb57e63952e4429515269458084ea5f Merge: 83358a4 5e1c9c7 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:25:48 2024 +0000 Merge branch 'main' into kc/final_fix commit c3e54461dd77f62aa50bcee8fbbebc14e4470644 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:23:19 2024 +0000 Fix logging utils bug on wandb grouping commit 2a94fb0 Author: kcz358 <[email protected]> Date: Sun Mar 3 13:01:11 2024 +0800 [Fix] refcocog dataset path, record gpt prompt in internal eval, build context issue (EvolvingLMMs-Lab#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit f6a7654 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 6dbf2a9 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit cbe3e52 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit a0ce88c84a9122b793a6b6d352896767fed1f18a Author: kcz358 <[email protected]> Date: Sat Mar 2 05:58:08 2024 +0000 Revise list_with_num model args commit b892d8eac7f656fafa5d6425b94b3d089e4a5268 Author: kcz358 <[email protected]> Date: Sat Mar 2 05:09:15 2024 +0000 Fix small bugs in list_with_num commit 74a747ff5e5a82cd8f61fb9f5a5357b67c867153 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 336de4a8408ece3c0a2b7b5880c00b38015674a1 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 5860f00373890a18ed09870757bcdae9f3821aa1 Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 912b73ed809e9242351874ce5b127c218188196d Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit f3f98531fc18a053b1a1bdec6c03757e1334e93b Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit ceccc944119c22177e7fe040ba73e468dcf6d419 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit d970b68e39068deb8308bb20af4266f4d37403df Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit f0b9201adeb8e2e78886a6746ead6b585430f7d8 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit f9cdc0331bf9ef3f1cca4a3791658b2f31f300ca Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit fb4bb090b185f18b8be4ef3353ec659a40e1b508 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 3d58243e32f551f5427950663157c2a5ce539504 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 95717b7ce70d40bc12e0b3b5809a686a083903aa Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 07915d5ec5d68ed0cde34bbb6e0c1438757fab72 Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit cc8ce2e48c31c5196ad5e0bca871acbe0c7492a1 Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit 562bb6c15876164ad49392df1a66ed6af84cac76 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit f2a585a4e5163b51dc31686a32a8aae7fd8e0751 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit e3896d1421b5ba5794db227648ca4316a0170569 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit f6a7654 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 6dbf2a9 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit cbe3e52 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update commands.md * Add repr_scripts for reference * Add timeout for gpt4V * Remove unnecessary dependencies * Add reproduce into readme * Revise seedbench process_result * Fix exclude dc hardcode postprocess logic error * Fix metric repeat issue * Update dataset runtime and add environment info * Revise val submission file saving path * Put the correct query into the gpt extraction * Update sleep time in utils.py * update --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
…volvingLMMs-Lab#62) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit a68962a Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0b02105 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '5e73e8b8a2408bd8193361788669ca80db19cb04' * Refactor ok_vqa_aggreate_submissions function * Merge commit '40099e8b8145bde513b9b7cef8461d8f13d1eafe' * Refactor VQA submission file saving * Update file utils * Merge commit 'a56fe11c00ad4a8b8967be88b93baef6649528c5' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit f4af7d0 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Add timeout to API requests * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix error logging in get_chat_response function * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Squashed commit of the following: commit 1cf38b3ad6c7799957901d836299243cc21718f5 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 62527c874431508b7731ad49ff1f1526104703cd Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 522f36aca8354f5efa7fff6d23bd90e885bcf1ab Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 4ee323a5b19382dbd9ba62f5002042d0746c374e Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit 3d3e164489cb4bd2db342ae085da9613ee7de660 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit 8a4f586d7232a4d89977cef140900728d4517b72 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit 33dd5b0e0006882e735b7ea1908fdb6ad37c825a Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit f19de3e7aaf5151d5ce9c63a2b9ee393c6282dfa Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit e1f8cad15ddc2e385a3f2a778a4af57e1072987c Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 472b6b1ed2d5bc10ff1d6df8e435f33dc821ad4b Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 367c021bd50068baf024bea3afde4ed58aa38b81 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 0a466e16d983392cbf0580733500c0890521df93 Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 6feceda2c1d631243c78fd7805dcdde4d0e8912f Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit db1f731ee5aff4618edefed018e982f83add0c9a Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit c8a5e1129310ed1ce1fd86f43bb49da701140383 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit de53ceaeff08dc7c01962c704e06d7b87f804ec7 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit e372631e911f2e03cc4f579e291e1198c4c11298 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit a68962a Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0b02105 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '5e73e8b8a2408bd8193361788669ca80db19cb04' * Refactor ok_vqa_aggreate_submissions function * Merge commit '40099e8b8145bde513b9b7cef8461d8f13d1eafe' * Refactor VQA submission file saving * Update file utils * Merge commit 'a56fe11c00ad4a8b8967be88b93baef6649528c5' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit f4af7d0 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Fix small bugs in list_with_num * Revise list_with_num model args * Dev/readme rm rolling (EvolvingLMMs-Lab#60) * remove log_likelyhood_rolling * Update time efficiency benchmark in README.md * add task guide --------- Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove unnecessary code and update dependencies * Fix logging utils bug on wandb grouping * Add reproduce envs * Squashed commit of the following: commit cf18d7a1300311ffe1c9671fff7fa0c0d1cf2476 Merge: 2475639 f89a736 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:12:12 2024 +0800 Merge branch 'main' into kc/final_fix commit 35e5a937bcf924d6b787ce37c6da9f0f54674da9 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:11:04 2024 +0800 Add reproduce envs commit 3c741a5 Author: kcz358 <[email protected]> Date: Sun Mar 3 21:19:15 2024 +0800 [Fix] wandb group logging missing columns (EvolvingLMMs-Lab#61) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit a68962a Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0b02105 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '5e73e8b8a2408bd8193361788669ca80db19cb04' * Refactor ok_vqa_aggreate_submissions function * Merge commit '40099e8b8145bde513b9b7cef8461d8f13d1eafe' * Refactor VQA submission file saving * Update file utils * Merge commit 'a56fe11c00ad4a8b8967be88b93baef6649528c5' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit f4af7d0 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit 39ce670fb1992c5e30d4b0eff9636a88a1ce83f5 Merge: 83358a4 5e1c9c7 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:25:48 2024 +0000 Merge branch 'main' into kc/final_fix commit 36eeaa08730cd3e6a7e90e7000f61b4ebb075524 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:23:19 2024 +0000 Fix logging utils bug on wandb grouping commit 9eb42de Author: kcz358 <[email protected]> Date: Sun Mar 3 13:01:11 2024 +0800 [Fix] refcocog dataset path, record gpt prompt in internal eval, build context issue (EvolvingLMMs-Lab#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit a68962a Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0b02105 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '5e73e8b8a2408bd8193361788669ca80db19cb04' * Refactor ok_vqa_aggreate_submissions function * Merge commit '40099e8b8145bde513b9b7cef8461d8f13d1eafe' * Refactor VQA submission file saving * Update file utils * Merge commit 'a56fe11c00ad4a8b8967be88b93baef6649528c5' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit f4af7d0 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit 22fda28d8aa2a53405f15d179ea9baaf53a19b0b Author: kcz358 <[email protected]> Date: Sat Mar 2 05:58:08 2024 +0000 Revise list_with_num model args commit 48d92eb823b7929ea4c7b0da9f2284ec194c71cf Author: kcz358 <[email protected]> Date: Sat Mar 2 05:09:15 2024 +0000 Fix small bugs in list_with_num commit 1cf38b3ad6c7799957901d836299243cc21718f5 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 62527c874431508b7731ad49ff1f1526104703cd Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 522f36aca8354f5efa7fff6d23bd90e885bcf1ab Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 4ee323a5b19382dbd9ba62f5002042d0746c374e Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit 3d3e164489cb4bd2db342ae085da9613ee7de660 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit 8a4f586d7232a4d89977cef140900728d4517b72 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit 33dd5b0e0006882e735b7ea1908fdb6ad37c825a Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit f19de3e7aaf5151d5ce9c63a2b9ee393c6282dfa Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit e1f8cad15ddc2e385a3f2a778a4af57e1072987c Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 472b6b1ed2d5bc10ff1d6df8e435f33dc821ad4b Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 367c021bd50068baf024bea3afde4ed58aa38b81 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 0a466e16d983392cbf0580733500c0890521df93 Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 6feceda2c1d631243c78fd7805dcdde4d0e8912f Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit db1f731ee5aff4618edefed018e982f83add0c9a Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit c8a5e1129310ed1ce1fd86f43bb49da701140383 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit de53ceaeff08dc7c01962c704e06d7b87f804ec7 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit e372631e911f2e03cc4f579e291e1198c4c11298 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit a68962a Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0b02105 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '5e73e8b8a2408bd8193361788669ca80db19cb04' * Refactor ok_vqa_aggreate_submissions function * Merge commit '40099e8b8145bde513b9b7cef8461d8f13d1eafe' * Refactor VQA submission file saving * Update file utils * Merge commit 'a56fe11c00ad4a8b8967be88b93baef6649528c5' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit f4af7d0 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update commands.md * Add repr_scripts for reference * Add timeout for gpt4V * Remove unnecessary dependencies * Add reproduce into readme * Revise seedbench process_result * Fix exclude dc hardcode postprocess logic error * Fix metric repeat issue * Update dataset runtime and add environment info * Revise val submission file saving path * Put the correct query into the gpt extraction * Update sleep time in utils.py * update --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
…volvingLMMs-Lab#62) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 2782eb0 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 7e8d3e4 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 4fa73ba Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Add timeout to API requests * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix error logging in get_chat_response function * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Squashed commit of the following: commit e873012d0da2711f2076f7c09f390901f89da2f9 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 621cdd663e0197827a5792872f13cdf3d27d2813 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 6daf75c54fe3d45970c5d35a10000f10c1420c6b Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 2a7a03205a2514fe0322ab4aa05c4948f9233109 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit a99850057224596d01835fface39d4aafd79de3e Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit 42f5fc125c7ee7d31633647f29f0d02ed3e640a8 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit dddd0276003115c8a150a78eb3ae7bd299c460e4 Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit bcffe0b45083f48886e18d5ece5f2504b96bbcbd Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit f6705996b992363f2fd3c5dedb90e1bd51d04426 Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 9290fc1c27ecca86f7ec3df0d932c7fa228e19c9 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 2fceaaf8f855d08d642996cd217ec0f6fc0fa04c Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 33c0a81c91733e9aabe214f0797be2fdd3df1f1c Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 90ad0ace136a35ecc16a09ce841736842f7eb6dd Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit 15b0336a932ef1823696e63672837700ce4fdae9 Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit f75e7cfd35b1ee814f86abb9d4fbace027c00941 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 06c51ea7682e31964ca720a8a40705a3a7f3f360 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit cdf7e6f77f7b6eee960e01e80c00ec74b8c1fbe7 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit 2782eb0 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 7e8d3e4 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 4fa73ba Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Fix small bugs in list_with_num * Revise list_with_num model args * Dev/readme rm rolling (EvolvingLMMs-Lab#60) * remove log_likelyhood_rolling * Update time efficiency benchmark in README.md * add task guide --------- Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove unnecessary code and update dependencies * Fix logging utils bug on wandb grouping * Add reproduce envs * Squashed commit of the following: commit bf49a3e1de8431193bdf6f7688a4ff7f4683a84d Merge: 2475639 f89a736 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:12:12 2024 +0800 Merge branch 'main' into kc/final_fix commit b535df91bc792b3b2b296572ec4692c75fdfe878 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:11:04 2024 +0800 Add reproduce envs commit d0539a0 Author: kcz358 <[email protected]> Date: Sun Mar 3 21:19:15 2024 +0800 [Fix] wandb group logging missing columns (EvolvingLMMs-Lab#61) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 2782eb0 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 7e8d3e4 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 4fa73ba Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit 7dc049915a1846177e0f9f8eab12366881f82157 Merge: 83358a4 5e1c9c7 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:25:48 2024 +0000 Merge branch 'main' into kc/final_fix commit 5ec98efc7b666341adc726b8d1d4779b6c543f7f Author: kcz358 <[email protected]> Date: Sun Mar 3 07:23:19 2024 +0000 Fix logging utils bug on wandb grouping commit 105d781 Author: kcz358 <[email protected]> Date: Sun Mar 3 13:01:11 2024 +0800 [Fix] refcocog dataset path, record gpt prompt in internal eval, build context issue (EvolvingLMMs-Lab#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 2782eb0 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 7e8d3e4 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 4fa73ba Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit 8263ca91c87a127d992dd01bdac5f89b8a5ff521 Author: kcz358 <[email protected]> Date: Sat Mar 2 05:58:08 2024 +0000 Revise list_with_num model args commit c413569d46be0ad604cd249df8bd58ffe26c0e39 Author: kcz358 <[email protected]> Date: Sat Mar 2 05:09:15 2024 +0000 Fix small bugs in list_with_num commit e873012d0da2711f2076f7c09f390901f89da2f9 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 621cdd663e0197827a5792872f13cdf3d27d2813 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 6daf75c54fe3d45970c5d35a10000f10c1420c6b Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 2a7a03205a2514fe0322ab4aa05c4948f9233109 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit a99850057224596d01835fface39d4aafd79de3e Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit 42f5fc125c7ee7d31633647f29f0d02ed3e640a8 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit dddd0276003115c8a150a78eb3ae7bd299c460e4 Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit bcffe0b45083f48886e18d5ece5f2504b96bbcbd Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit f6705996b992363f2fd3c5dedb90e1bd51d04426 Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 9290fc1c27ecca86f7ec3df0d932c7fa228e19c9 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 2fceaaf8f855d08d642996cd217ec0f6fc0fa04c Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 33c0a81c91733e9aabe214f0797be2fdd3df1f1c Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 90ad0ace136a35ecc16a09ce841736842f7eb6dd Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit 15b0336a932ef1823696e63672837700ce4fdae9 Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit f75e7cfd35b1ee814f86abb9d4fbace027c00941 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 06c51ea7682e31964ca720a8a40705a3a7f3f360 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit cdf7e6f77f7b6eee960e01e80c00ec74b8c1fbe7 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit 2782eb0 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 7e8d3e4 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 4fa73ba Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update commands.md * Add repr_scripts for reference * Add timeout for gpt4V * Remove unnecessary dependencies * Add reproduce into readme * Revise seedbench process_result * Fix exclude dc hardcode postprocess logic error * Fix metric repeat issue * Update dataset runtime and add environment info * Revise val submission file saving path * Put the correct query into the gpt extraction * Update sleep time in utils.py * update --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
* Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 5c6e0c8 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 8bd568e Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '63fc8eee4dddfbe741e5a862e5ff30d19c34238e' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'd16bbce134d453c624834e090af1e0f869fdde15' * Refactor VQA submission file saving * Update file utils * Merge commit '7332704263a45ab6fa69aad0c4303cd9cbc26813' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0e0c698 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Add timeout to API requests * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix error logging in get_chat_response function * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Squashed commit of the following: commit 584db7fcc0140dd4a6d6481529ae90570b2912c4 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 5e52a8df3785eb2d1b392eb164b66e92c9dadb02 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit a3cae8e9f3570121d51885c71f7081da36c5d13d Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 0b3cad596fd58e6414ea015e79bef1eea6eb7f7a Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit f436cb65bd716d93044516ece2133ab5b8d87137 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit 3d47b59f92cef22cfe38e00b407ce38a61d538b2 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit ffb9eb26dae25cda1e0d3e302852862102b47054 Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 1700786b572cbedcb6969ae97028225d388987bb Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit 786f2b53d57265b9900b0718d27538221b5f81b4 Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 888c1c128319bd04528727a309d0d92aaee9e752 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 8c74caa2f77940c781501b45571d7c6362c9a6c8 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 4ab5cc32e3a460ad112dcd3031cea55b6bc0f691 Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit eae08b536908875eeb600538e853caaa14c655ae Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit 4240785c1bf3a7fd15f36013803c004542a17f2e Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit c15bf75d2f76e215b4d5de43c1d17b5a41d79753 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 93534dc4e98b78b9da01099079187d8960705fb8 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit 05166a14c45063bf108282c3202d32feb2fe0afa Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit 5c6e0c8 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 8bd568e Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '63fc8eee4dddfbe741e5a862e5ff30d19c34238e' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'd16bbce134d453c624834e090af1e0f869fdde15' * Refactor VQA submission file saving * Update file utils * Merge commit '7332704263a45ab6fa69aad0c4303cd9cbc26813' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0e0c698 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Fix small bugs in list_with_num * Revise list_with_num model args * Dev/readme rm rolling (EvolvingLMMs-Lab#60) * remove log_likelyhood_rolling * Update time efficiency benchmark in README.md * add task guide --------- Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove unnecessary code and update dependencies * Fix logging utils bug on wandb grouping * Add reproduce envs * Squashed commit of the following: commit 172a002845728f263a9221206aeab62bdc1070dc Merge: 2475639 2152f18 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:12:12 2024 +0800 Merge branch 'main' into kc/final_fix commit 2475639fcf9164a7965b080c31dc50bc856fa053 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:11:04 2024 +0800 Add reproduce envs commit 2152f18 Author: kcz358 <[email protected]> Date: Sun Mar 3 21:19:15 2024 +0800 [Fix] wandb group logging missing columns (EvolvingLMMs-Lab#61) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 5c6e0c8 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 8bd568e Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '63fc8eee4dddfbe741e5a862e5ff30d19c34238e' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'd16bbce134d453c624834e090af1e0f869fdde15' * Refactor VQA submission file saving * Update file utils * Merge commit '7332704263a45ab6fa69aad0c4303cd9cbc26813' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0e0c698 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit 5902608191d5a8a059c2a267afc0100f47140fae Merge: 83358a4 fd7773d Author: kcz358 <[email protected]> Date: Sun Mar 3 07:25:48 2024 +0000 Merge branch 'main' into kc/final_fix commit 83358a42354d8ec57d3d887e2262f82e7dd4c532 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:23:19 2024 +0000 Fix logging utils bug on wandb grouping commit fd7773d Author: kcz358 <[email protected]> Date: Sun Mar 3 13:01:11 2024 +0800 [Fix] refcocog dataset path, record gpt prompt in internal eval, build context issue (EvolvingLMMs-Lab#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 5c6e0c8 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 8bd568e Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '63fc8eee4dddfbe741e5a862e5ff30d19c34238e' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'd16bbce134d453c624834e090af1e0f869fdde15' * Refactor VQA submission file saving * Update file utils * Merge commit '7332704263a45ab6fa69aad0c4303cd9cbc26813' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0e0c698 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit ce51924783fa5c50f99815a33988476ee1220bac Author: kcz358 <[email protected]> Date: Sat Mar 2 05:58:08 2024 +0000 Revise list_with_num model args commit a288035b48620b827a82c1c45412fe2bb3c18715 Author: kcz358 <[email protected]> Date: Sat Mar 2 05:09:15 2024 +0000 Fix small bugs in list_with_num commit 584db7fcc0140dd4a6d6481529ae90570b2912c4 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 5e52a8df3785eb2d1b392eb164b66e92c9dadb02 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit a3cae8e9f3570121d51885c71f7081da36c5d13d Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 0b3cad596fd58e6414ea015e79bef1eea6eb7f7a Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit f436cb65bd716d93044516ece2133ab5b8d87137 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit 3d47b59f92cef22cfe38e00b407ce38a61d538b2 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit ffb9eb26dae25cda1e0d3e302852862102b47054 Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 1700786b572cbedcb6969ae97028225d388987bb Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit 786f2b53d57265b9900b0718d27538221b5f81b4 Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 888c1c128319bd04528727a309d0d92aaee9e752 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 8c74caa2f77940c781501b45571d7c6362c9a6c8 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 4ab5cc32e3a460ad112dcd3031cea55b6bc0f691 Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit eae08b536908875eeb600538e853caaa14c655ae Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit 4240785c1bf3a7fd15f36013803c004542a17f2e Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit c15bf75d2f76e215b4d5de43c1d17b5a41d79753 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 93534dc4e98b78b9da01099079187d8960705fb8 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit 05166a14c45063bf108282c3202d32feb2fe0afa Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit 5c6e0c8 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 8bd568e Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '63fc8eee4dddfbe741e5a862e5ff30d19c34238e' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'd16bbce134d453c624834e090af1e0f869fdde15' * Refactor VQA submission file saving * Update file utils * Merge commit '7332704263a45ab6fa69aad0c4303cd9cbc26813' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0e0c698 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update commands.md * Add repr_scripts for reference * Add timeout for gpt4V * Remove unnecessary dependencies * Add reproduce into readme * Revise seedbench process_result * Fix exclude dc hardcode postprocess logic error * Fix metric repeat issue * Update dataset runtime and add environment info * Revise val submission file saving path * Put the correct query into the gpt extraction * Update sleep time in utils.py * update --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
* Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 90fbf3d Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0fa3bce Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0182d5d Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Add timeout to API requests * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix error logging in get_chat_response function * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Squashed commit of the following: commit 74a747ff5e5a82cd8f61fb9f5a5357b67c867153 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 336de4a8408ece3c0a2b7b5880c00b38015674a1 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 5860f00373890a18ed09870757bcdae9f3821aa1 Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 912b73ed809e9242351874ce5b127c218188196d Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit f3f98531fc18a053b1a1bdec6c03757e1334e93b Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit ceccc944119c22177e7fe040ba73e468dcf6d419 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit d970b68e39068deb8308bb20af4266f4d37403df Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit f0b9201adeb8e2e78886a6746ead6b585430f7d8 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit f9cdc0331bf9ef3f1cca4a3791658b2f31f300ca Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit fb4bb090b185f18b8be4ef3353ec659a40e1b508 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 3d58243e32f551f5427950663157c2a5ce539504 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 95717b7ce70d40bc12e0b3b5809a686a083903aa Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 07915d5ec5d68ed0cde34bbb6e0c1438757fab72 Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit cc8ce2e48c31c5196ad5e0bca871acbe0c7492a1 Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit 562bb6c15876164ad49392df1a66ed6af84cac76 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit f2a585a4e5163b51dc31686a32a8aae7fd8e0751 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit e3896d1421b5ba5794db227648ca4316a0170569 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit 90fbf3d Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0fa3bce Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0182d5d Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Fix small bugs in list_with_num * Revise list_with_num model args * Dev/readme rm rolling (EvolvingLMMs-Lab#60) * remove log_likelyhood_rolling * Update time efficiency benchmark in README.md * add task guide --------- Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove unnecessary code and update dependencies * Fix logging utils bug on wandb grouping * Add reproduce envs * Squashed commit of the following: commit d1d4ca79d569d5765080160bd8c7e8d432cadd99 Merge: 2475639 f89a736 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:12:12 2024 +0800 Merge branch 'main' into kc/final_fix commit d1815c3465e43a083ab811e8fc8602911a971413 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:11:04 2024 +0800 Add reproduce envs commit b8b7f79 Author: kcz358 <[email protected]> Date: Sun Mar 3 21:19:15 2024 +0800 [Fix] wandb group logging missing columns (EvolvingLMMs-Lab#61) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 90fbf3d Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0fa3bce Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0182d5d Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit bf67bcc02cb57e63952e4429515269458084ea5f Merge: 83358a4 5e1c9c7 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:25:48 2024 +0000 Merge branch 'main' into kc/final_fix commit c3e54461dd77f62aa50bcee8fbbebc14e4470644 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:23:19 2024 +0000 Fix logging utils bug on wandb grouping commit 09eecf5 Author: kcz358 <[email protected]> Date: Sun Mar 3 13:01:11 2024 +0800 [Fix] refcocog dataset path, record gpt prompt in internal eval, build context issue (EvolvingLMMs-Lab#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 90fbf3d Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0fa3bce Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0182d5d Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit a0ce88c84a9122b793a6b6d352896767fed1f18a Author: kcz358 <[email protected]> Date: Sat Mar 2 05:58:08 2024 +0000 Revise list_with_num model args commit b892d8eac7f656fafa5d6425b94b3d089e4a5268 Author: kcz358 <[email protected]> Date: Sat Mar 2 05:09:15 2024 +0000 Fix small bugs in list_with_num commit 74a747ff5e5a82cd8f61fb9f5a5357b67c867153 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 336de4a8408ece3c0a2b7b5880c00b38015674a1 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 5860f00373890a18ed09870757bcdae9f3821aa1 Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 912b73ed809e9242351874ce5b127c218188196d Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit f3f98531fc18a053b1a1bdec6c03757e1334e93b Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit ceccc944119c22177e7fe040ba73e468dcf6d419 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit d970b68e39068deb8308bb20af4266f4d37403df Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit f0b9201adeb8e2e78886a6746ead6b585430f7d8 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit f9cdc0331bf9ef3f1cca4a3791658b2f31f300ca Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit fb4bb090b185f18b8be4ef3353ec659a40e1b508 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 3d58243e32f551f5427950663157c2a5ce539504 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 95717b7ce70d40bc12e0b3b5809a686a083903aa Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 07915d5ec5d68ed0cde34bbb6e0c1438757fab72 Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit cc8ce2e48c31c5196ad5e0bca871acbe0c7492a1 Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit 562bb6c15876164ad49392df1a66ed6af84cac76 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit f2a585a4e5163b51dc31686a32a8aae7fd8e0751 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit e3896d1421b5ba5794db227648ca4316a0170569 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit 90fbf3d Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0fa3bce Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0182d5d Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update commands.md * Add repr_scripts for reference * Add timeout for gpt4V * Remove unnecessary dependencies * Add reproduce into readme * Revise seedbench process_result * Fix exclude dc hardcode postprocess logic error * Fix metric repeat issue * Update dataset runtime and add environment info * Revise val submission file saving path * Put the correct query into the gpt extraction * Update sleep time in utils.py * update --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
* Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 9c0bc58 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 30ab0ce Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '5e73e8b8a2408bd8193361788669ca80db19cb04' * Refactor ok_vqa_aggreate_submissions function * Merge commit '40099e8b8145bde513b9b7cef8461d8f13d1eafe' * Refactor VQA submission file saving * Update file utils * Merge commit 'a56fe11c00ad4a8b8967be88b93baef6649528c5' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit a5b07ee Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Add timeout to API requests * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix error logging in get_chat_response function * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Squashed commit of the following: commit 1cf38b3ad6c7799957901d836299243cc21718f5 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 62527c874431508b7731ad49ff1f1526104703cd Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 522f36aca8354f5efa7fff6d23bd90e885bcf1ab Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 4ee323a5b19382dbd9ba62f5002042d0746c374e Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit 3d3e164489cb4bd2db342ae085da9613ee7de660 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit 8a4f586d7232a4d89977cef140900728d4517b72 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit 33dd5b0e0006882e735b7ea1908fdb6ad37c825a Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit f19de3e7aaf5151d5ce9c63a2b9ee393c6282dfa Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit e1f8cad15ddc2e385a3f2a778a4af57e1072987c Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 472b6b1ed2d5bc10ff1d6df8e435f33dc821ad4b Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 367c021bd50068baf024bea3afde4ed58aa38b81 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 0a466e16d983392cbf0580733500c0890521df93 Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 6feceda2c1d631243c78fd7805dcdde4d0e8912f Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit db1f731ee5aff4618edefed018e982f83add0c9a Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit c8a5e1129310ed1ce1fd86f43bb49da701140383 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit de53ceaeff08dc7c01962c704e06d7b87f804ec7 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit e372631e911f2e03cc4f579e291e1198c4c11298 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit 9c0bc58 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 30ab0ce Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '5e73e8b8a2408bd8193361788669ca80db19cb04' * Refactor ok_vqa_aggreate_submissions function * Merge commit '40099e8b8145bde513b9b7cef8461d8f13d1eafe' * Refactor VQA submission file saving * Update file utils * Merge commit 'a56fe11c00ad4a8b8967be88b93baef6649528c5' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit a5b07ee Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Fix small bugs in list_with_num * Revise list_with_num model args * Dev/readme rm rolling (EvolvingLMMs-Lab#60) * remove log_likelyhood_rolling * Update time efficiency benchmark in README.md * add task guide --------- Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove unnecessary code and update dependencies * Fix logging utils bug on wandb grouping * Add reproduce envs * Squashed commit of the following: commit cf18d7a1300311ffe1c9671fff7fa0c0d1cf2476 Merge: 2475639 f89a736 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:12:12 2024 +0800 Merge branch 'main' into kc/final_fix commit 35e5a937bcf924d6b787ce37c6da9f0f54674da9 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:11:04 2024 +0800 Add reproduce envs commit 13179f9 Author: kcz358 <[email protected]> Date: Sun Mar 3 21:19:15 2024 +0800 [Fix] wandb group logging missing columns (EvolvingLMMs-Lab#61) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 9c0bc58 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 30ab0ce Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '5e73e8b8a2408bd8193361788669ca80db19cb04' * Refactor ok_vqa_aggreate_submissions function * Merge commit '40099e8b8145bde513b9b7cef8461d8f13d1eafe' * Refactor VQA submission file saving * Update file utils * Merge commit 'a56fe11c00ad4a8b8967be88b93baef6649528c5' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit a5b07ee Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit 39ce670fb1992c5e30d4b0eff9636a88a1ce83f5 Merge: 83358a4 5e1c9c7 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:25:48 2024 +0000 Merge branch 'main' into kc/final_fix commit 36eeaa08730cd3e6a7e90e7000f61b4ebb075524 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:23:19 2024 +0000 Fix logging utils bug on wandb grouping commit 9ac7212 Author: kcz358 <[email protected]> Date: Sun Mar 3 13:01:11 2024 +0800 [Fix] refcocog dataset path, record gpt prompt in internal eval, build context issue (EvolvingLMMs-Lab#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 9c0bc58 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 30ab0ce Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '5e73e8b8a2408bd8193361788669ca80db19cb04' * Refactor ok_vqa_aggreate_submissions function * Merge commit '40099e8b8145bde513b9b7cef8461d8f13d1eafe' * Refactor VQA submission file saving * Update file utils * Merge commit 'a56fe11c00ad4a8b8967be88b93baef6649528c5' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit a5b07ee Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit 22fda28d8aa2a53405f15d179ea9baaf53a19b0b Author: kcz358 <[email protected]> Date: Sat Mar 2 05:58:08 2024 +0000 Revise list_with_num model args commit 48d92eb823b7929ea4c7b0da9f2284ec194c71cf Author: kcz358 <[email protected]> Date: Sat Mar 2 05:09:15 2024 +0000 Fix small bugs in list_with_num commit 1cf38b3ad6c7799957901d836299243cc21718f5 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 62527c874431508b7731ad49ff1f1526104703cd Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 522f36aca8354f5efa7fff6d23bd90e885bcf1ab Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 4ee323a5b19382dbd9ba62f5002042d0746c374e Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit 3d3e164489cb4bd2db342ae085da9613ee7de660 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit 8a4f586d7232a4d89977cef140900728d4517b72 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit 33dd5b0e0006882e735b7ea1908fdb6ad37c825a Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit f19de3e7aaf5151d5ce9c63a2b9ee393c6282dfa Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit e1f8cad15ddc2e385a3f2a778a4af57e1072987c Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 472b6b1ed2d5bc10ff1d6df8e435f33dc821ad4b Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 367c021bd50068baf024bea3afde4ed58aa38b81 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 0a466e16d983392cbf0580733500c0890521df93 Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 6feceda2c1d631243c78fd7805dcdde4d0e8912f Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit db1f731ee5aff4618edefed018e982f83add0c9a Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit c8a5e1129310ed1ce1fd86f43bb49da701140383 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit de53ceaeff08dc7c01962c704e06d7b87f804ec7 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit e372631e911f2e03cc4f579e291e1198c4c11298 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit 9c0bc58 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 30ab0ce Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '5e73e8b8a2408bd8193361788669ca80db19cb04' * Refactor ok_vqa_aggreate_submissions function * Merge commit '40099e8b8145bde513b9b7cef8461d8f13d1eafe' * Refactor VQA submission file saving * Update file utils * Merge commit 'a56fe11c00ad4a8b8967be88b93baef6649528c5' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit a5b07ee Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update commands.md * Add repr_scripts for reference * Add timeout for gpt4V * Remove unnecessary dependencies * Add reproduce into readme * Revise seedbench process_result * Fix exclude dc hardcode postprocess logic error * Fix metric repeat issue * Update dataset runtime and add environment info * Revise val submission file saving path * Put the correct query into the gpt extraction * Update sleep time in utils.py * update --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
* Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit b3f1eff Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0f26c8a Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit fefc964 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Add timeout to API requests * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix error logging in get_chat_response function * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Squashed commit of the following: commit 6bb0667ea746cc1dfa9442882f517edd47694d3e Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit e4ab9fc9ec7d77850ecc05bd33256909cdf62513 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 74c28de92a5794054d7c937b727fba3a8e5821c3 Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 279be1be1e2a839c97e58289362d6828e95e064a Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit 666f3146feef55f898f710254824d4b2c57e6747 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit 1f8d04d20feb6363615537ab47f8a1241c4ee692 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit 985194e49f519ce04bdc2c0ce00eee3ab6c02def Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit ef5a0a3b46acc36255c28781d8d66fc9bd32d47b Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit e793fd1da7416d7938a6f9e98728692c04264a97 Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit c1ae0a853bfdcc7d59e3d9fa0eaa78d4d1f01336 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 3ca0112d74b957f4d4ca20be5573deb8141793c7 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 821398fde93ccd52eac2f4bbfb8c2e787a10b987 Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 5172c13fb3b212c0d175987727433320a1faacbc Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit 12a243c8bee0be6ffacf17e46143519734c310d5 Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit 2aded15347d10078c49606b690d05935ad29e6d1 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 9d499f198a9bdab2177bedfd3980c00934c684ff Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit c5431b5b80cbaf6e11d840ecb1d0734d680ac41b Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit b3f1eff Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0f26c8a Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit fefc964 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Fix small bugs in list_with_num * Revise list_with_num model args * Dev/readme rm rolling (EvolvingLMMs-Lab#60) * remove log_likelyhood_rolling * Update time efficiency benchmark in README.md * add task guide --------- Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove unnecessary code and update dependencies * Fix logging utils bug on wandb grouping * Add reproduce envs * Squashed commit of the following: commit 084a21394643acd741fe0969dd0d3f6c6c734853 Merge: 2475639 f89a736 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:12:12 2024 +0800 Merge branch 'main' into kc/final_fix commit 803d0aec82a57de2ddf1527044f14ed968c30e25 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:11:04 2024 +0800 Add reproduce envs commit c5344f6 Author: kcz358 <[email protected]> Date: Sun Mar 3 21:19:15 2024 +0800 [Fix] wandb group logging missing columns (EvolvingLMMs-Lab#61) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit b3f1eff Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0f26c8a Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit fefc964 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit 7009af6bc533534e249b3070f122d825ce738ba0 Merge: 83358a4 5e1c9c7 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:25:48 2024 +0000 Merge branch 'main' into kc/final_fix commit 44b1e7fc5570130e64269c312c11fe0244c72c87 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:23:19 2024 +0000 Fix logging utils bug on wandb grouping commit 34476c7 Author: kcz358 <[email protected]> Date: Sun Mar 3 13:01:11 2024 +0800 [Fix] refcocog dataset path, record gpt prompt in internal eval, build context issue (EvolvingLMMs-Lab#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit b3f1eff Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0f26c8a Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit fefc964 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit be339f832f760190e81bbfbeffb7049f7cccee60 Author: kcz358 <[email protected]> Date: Sat Mar 2 05:58:08 2024 +0000 Revise list_with_num model args commit f301a5614054538cd7c18d3ac7b1f02305e68224 Author: kcz358 <[email protected]> Date: Sat Mar 2 05:09:15 2024 +0000 Fix small bugs in list_with_num commit 6bb0667ea746cc1dfa9442882f517edd47694d3e Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit e4ab9fc9ec7d77850ecc05bd33256909cdf62513 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 74c28de92a5794054d7c937b727fba3a8e5821c3 Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 279be1be1e2a839c97e58289362d6828e95e064a Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit 666f3146feef55f898f710254824d4b2c57e6747 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit 1f8d04d20feb6363615537ab47f8a1241c4ee692 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit 985194e49f519ce04bdc2c0ce00eee3ab6c02def Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit ef5a0a3b46acc36255c28781d8d66fc9bd32d47b Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit e793fd1da7416d7938a6f9e98728692c04264a97 Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit c1ae0a853bfdcc7d59e3d9fa0eaa78d4d1f01336 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 3ca0112d74b957f4d4ca20be5573deb8141793c7 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 821398fde93ccd52eac2f4bbfb8c2e787a10b987 Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 5172c13fb3b212c0d175987727433320a1faacbc Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit 12a243c8bee0be6ffacf17e46143519734c310d5 Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit 2aded15347d10078c49606b690d05935ad29e6d1 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 9d499f198a9bdab2177bedfd3980c00934c684ff Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit c5431b5b80cbaf6e11d840ecb1d0734d680ac41b Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit b3f1eff Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0f26c8a Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit fefc964 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update commands.md * Add repr_scripts for reference * Add timeout for gpt4V * Remove unnecessary dependencies * Add reproduce into readme * Revise seedbench process_result * Fix exclude dc hardcode postprocess logic error * Fix metric repeat issue * Update dataset runtime and add environment info * Revise val submission file saving path * Put the correct query into the gpt extraction * Update sleep time in utils.py * update --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
* Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit fffe545 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit c608dd6 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'b636596c46dce543cdfacc0809c5b14edafcf1fd' * Refactor ok_vqa_aggreate_submissions function * Merge commit '5624cd5b92ff6b1bc1d431a615d938fd623a03c4' * Refactor VQA submission file saving * Update file utils * Merge commit '034d73b022739333da5e60f432330b8ea832ef9b' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit a0959f1 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Add timeout to API requests * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix error logging in get_chat_response function * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Squashed commit of the following: commit decb360fd834d968cc59dee6a06d40a326177ec5 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 1e1ecf0de94b5e493ce0590269b3a2b9d030e31d Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit ade2b08994f0b92f20d373cbc3cc8e2a8b665f49 Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 3bca65bca4b9b4cab80d50172dabda5c549c539f Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit 0ee12be56664eac6a79599b48ea22985f18ec358 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit 62cb1058ac416027ad981e3ba31ce029dfe83cf3 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit 55447e7039321ed8d46c8dccaf75113288bdb502 Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 26963ddb5315e39ad9142e0fa1391fe2b8201c54 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit 7abbd695dfa73e09687a4d4f73c6bc99e63c811a Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit d1b19061661c1da1d3b7e9cd5d126ec475b6e1de Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 2564e74c7e8c07a51200560be70d2be13501fd9a Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 4885702fcd36cfaf5bf2e498621fa0a831e8617c Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 46fc13424e6fecaa15d290f2330bc440ce9bd6e6 Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit 7cbeb3a05fc13fa9d0d44a17a7cd25e7550c435b Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit e8a88505cbd71029682eaaddc8fe2c5cd41ccf5d Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit efc341983b959fb2cc9cc208879a86a01c251494 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit 2b92f718f478b9f7999b17560439db366d2165a3 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit fffe545 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit c608dd6 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'b636596c46dce543cdfacc0809c5b14edafcf1fd' * Refactor ok_vqa_aggreate_submissions function * Merge commit '5624cd5b92ff6b1bc1d431a615d938fd623a03c4' * Refactor VQA submission file saving * Update file utils * Merge commit '034d73b022739333da5e60f432330b8ea832ef9b' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit a0959f1 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Fix small bugs in list_with_num * Revise list_with_num model args * Dev/readme rm rolling (EvolvingLMMs-Lab#60) * remove log_likelyhood_rolling * Update time efficiency benchmark in README.md * add task guide --------- Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove unnecessary code and update dependencies * Fix logging utils bug on wandb grouping * Add reproduce envs * Squashed commit of the following: commit 4a1f385be0df3374ebf428599cfe35febdae0582 Merge: 2475639 f89a736 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:12:12 2024 +0800 Merge branch 'main' into kc/final_fix commit 19f7d8cd771fddd6cc6c3fee8f3c51fa4ad83eaa Author: kcz358 <[email protected]> Date: Sun Mar 3 22:11:04 2024 +0800 Add reproduce envs commit 1b605af Author: kcz358 <[email protected]> Date: Sun Mar 3 21:19:15 2024 +0800 [Fix] wandb group logging missing columns (EvolvingLMMs-Lab#61) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit fffe545 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit c608dd6 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'b636596c46dce543cdfacc0809c5b14edafcf1fd' * Refactor ok_vqa_aggreate_submissions function * Merge commit '5624cd5b92ff6b1bc1d431a615d938fd623a03c4' * Refactor VQA submission file saving * Update file utils * Merge commit '034d73b022739333da5e60f432330b8ea832ef9b' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit a0959f1 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit d1fffce8c61bd7e1e32f76c953c5b26773be58d5 Merge: 83358a4 5e1c9c7 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:25:48 2024 +0000 Merge branch 'main' into kc/final_fix commit 5a4df5d39e813844002af1a02ef4ce0c69feaa6d Author: kcz358 <[email protected]> Date: Sun Mar 3 07:23:19 2024 +0000 Fix logging utils bug on wandb grouping commit b923ad1 Author: kcz358 <[email protected]> Date: Sun Mar 3 13:01:11 2024 +0800 [Fix] refcocog dataset path, record gpt prompt in internal eval, build context issue (EvolvingLMMs-Lab#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit fffe545 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit c608dd6 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'b636596c46dce543cdfacc0809c5b14edafcf1fd' * Refactor ok_vqa_aggreate_submissions function * Merge commit '5624cd5b92ff6b1bc1d431a615d938fd623a03c4' * Refactor VQA submission file saving * Update file utils * Merge commit '034d73b022739333da5e60f432330b8ea832ef9b' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit a0959f1 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit 7f852ee91653357b6ee954ec92bcf2e5bab4bbcf Author: kcz358 <[email protected]> Date: Sat Mar 2 05:58:08 2024 +0000 Revise list_with_num model args commit 79c737c915565b191ab29113c98615a1c6acc994 Author: kcz358 <[email protected]> Date: Sat Mar 2 05:09:15 2024 +0000 Fix small bugs in list_with_num commit decb360fd834d968cc59dee6a06d40a326177ec5 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 1e1ecf0de94b5e493ce0590269b3a2b9d030e31d Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit ade2b08994f0b92f20d373cbc3cc8e2a8b665f49 Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 3bca65bca4b9b4cab80d50172dabda5c549c539f Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit 0ee12be56664eac6a79599b48ea22985f18ec358 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit 62cb1058ac416027ad981e3ba31ce029dfe83cf3 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit 55447e7039321ed8d46c8dccaf75113288bdb502 Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 26963ddb5315e39ad9142e0fa1391fe2b8201c54 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit 7abbd695dfa73e09687a4d4f73c6bc99e63c811a Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit d1b19061661c1da1d3b7e9cd5d126ec475b6e1de Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 2564e74c7e8c07a51200560be70d2be13501fd9a Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 4885702fcd36cfaf5bf2e498621fa0a831e8617c Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 46fc13424e6fecaa15d290f2330bc440ce9bd6e6 Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit 7cbeb3a05fc13fa9d0d44a17a7cd25e7550c435b Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit e8a88505cbd71029682eaaddc8fe2c5cd41ccf5d Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit efc341983b959fb2cc9cc208879a86a01c251494 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit 2b92f718f478b9f7999b17560439db366d2165a3 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit fffe545 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit c608dd6 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'b636596c46dce543cdfacc0809c5b14edafcf1fd' * Refactor ok_vqa_aggreate_submissions function * Merge commit '5624cd5b92ff6b1bc1d431a615d938fd623a03c4' * Refactor VQA submission file saving * Update file utils * Merge commit '034d73b022739333da5e60f432330b8ea832ef9b' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit a0959f1 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update commands.md * Add repr_scripts for reference * Add timeout for gpt4V * Remove unnecessary dependencies * Add reproduce into readme * Revise seedbench process_result * Fix exclude dc hardcode postprocess logic error * Fix metric repeat issue * Update dataset runtime and add environment info * Revise val submission file saving path * Put the correct query into the gpt extraction * Update sleep time in utils.py * update --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
* Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit f6a7654 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 6dbf2a9 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit cbe3e52 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Add timeout to API requests * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix error logging in get_chat_response function * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Squashed commit of the following: commit 74a747ff5e5a82cd8f61fb9f5a5357b67c867153 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 336de4a8408ece3c0a2b7b5880c00b38015674a1 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 5860f00373890a18ed09870757bcdae9f3821aa1 Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 912b73ed809e9242351874ce5b127c218188196d Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit f3f98531fc18a053b1a1bdec6c03757e1334e93b Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit ceccc944119c22177e7fe040ba73e468dcf6d419 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit d970b68e39068deb8308bb20af4266f4d37403df Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit f0b9201adeb8e2e78886a6746ead6b585430f7d8 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit f9cdc0331bf9ef3f1cca4a3791658b2f31f300ca Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit fb4bb090b185f18b8be4ef3353ec659a40e1b508 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 3d58243e32f551f5427950663157c2a5ce539504 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 95717b7ce70d40bc12e0b3b5809a686a083903aa Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 07915d5ec5d68ed0cde34bbb6e0c1438757fab72 Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit cc8ce2e48c31c5196ad5e0bca871acbe0c7492a1 Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit 562bb6c15876164ad49392df1a66ed6af84cac76 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit f2a585a4e5163b51dc31686a32a8aae7fd8e0751 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit e3896d1421b5ba5794db227648ca4316a0170569 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit f6a7654 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 6dbf2a9 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit cbe3e52 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Fix small bugs in list_with_num * Revise list_with_num model args * Dev/readme rm rolling (EvolvingLMMs-Lab#60) * remove log_likelyhood_rolling * Update time efficiency benchmark in README.md * add task guide --------- Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove unnecessary code and update dependencies * Fix logging utils bug on wandb grouping * Add reproduce envs * Squashed commit of the following: commit d1d4ca79d569d5765080160bd8c7e8d432cadd99 Merge: 2475639 f89a736 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:12:12 2024 +0800 Merge branch 'main' into kc/final_fix commit d1815c3465e43a083ab811e8fc8602911a971413 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:11:04 2024 +0800 Add reproduce envs commit 27dbf48 Author: kcz358 <[email protected]> Date: Sun Mar 3 21:19:15 2024 +0800 [Fix] wandb group logging missing columns (EvolvingLMMs-Lab#61) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit f6a7654 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 6dbf2a9 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit cbe3e52 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit bf67bcc02cb57e63952e4429515269458084ea5f Merge: 83358a4 5e1c9c7 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:25:48 2024 +0000 Merge branch 'main' into kc/final_fix commit c3e54461dd77f62aa50bcee8fbbebc14e4470644 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:23:19 2024 +0000 Fix logging utils bug on wandb grouping commit 2a94fb0 Author: kcz358 <[email protected]> Date: Sun Mar 3 13:01:11 2024 +0800 [Fix] refcocog dataset path, record gpt prompt in internal eval, build context issue (EvolvingLMMs-Lab#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit f6a7654 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 6dbf2a9 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit cbe3e52 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit a0ce88c84a9122b793a6b6d352896767fed1f18a Author: kcz358 <[email protected]> Date: Sat Mar 2 05:58:08 2024 +0000 Revise list_with_num model args commit b892d8eac7f656fafa5d6425b94b3d089e4a5268 Author: kcz358 <[email protected]> Date: Sat Mar 2 05:09:15 2024 +0000 Fix small bugs in list_with_num commit 74a747ff5e5a82cd8f61fb9f5a5357b67c867153 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 336de4a8408ece3c0a2b7b5880c00b38015674a1 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 5860f00373890a18ed09870757bcdae9f3821aa1 Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 912b73ed809e9242351874ce5b127c218188196d Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit f3f98531fc18a053b1a1bdec6c03757e1334e93b Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit ceccc944119c22177e7fe040ba73e468dcf6d419 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit d970b68e39068deb8308bb20af4266f4d37403df Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit f0b9201adeb8e2e78886a6746ead6b585430f7d8 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit f9cdc0331bf9ef3f1cca4a3791658b2f31f300ca Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit fb4bb090b185f18b8be4ef3353ec659a40e1b508 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 3d58243e32f551f5427950663157c2a5ce539504 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 95717b7ce70d40bc12e0b3b5809a686a083903aa Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 07915d5ec5d68ed0cde34bbb6e0c1438757fab72 Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit cc8ce2e48c31c5196ad5e0bca871acbe0c7492a1 Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit 562bb6c15876164ad49392df1a66ed6af84cac76 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit f2a585a4e5163b51dc31686a32a8aae7fd8e0751 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit e3896d1421b5ba5794db227648ca4316a0170569 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit f6a7654 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 6dbf2a9 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit cbe3e52 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update commands.md * Add repr_scripts for reference * Add timeout for gpt4V * Remove unnecessary dependencies * Add reproduce into readme * Revise seedbench process_result * Fix exclude dc hardcode postprocess logic error * Fix metric repeat issue * Update dataset runtime and add environment info * Revise val submission file saving path * Put the correct query into the gpt extraction * Update sleep time in utils.py * update --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
* Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit a68962a Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0b02105 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '5e73e8b8a2408bd8193361788669ca80db19cb04' * Refactor ok_vqa_aggreate_submissions function * Merge commit '40099e8b8145bde513b9b7cef8461d8f13d1eafe' * Refactor VQA submission file saving * Update file utils * Merge commit 'a56fe11c00ad4a8b8967be88b93baef6649528c5' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit f4af7d0 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Add timeout to API requests * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix error logging in get_chat_response function * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Squashed commit of the following: commit 1cf38b3ad6c7799957901d836299243cc21718f5 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 62527c874431508b7731ad49ff1f1526104703cd Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 522f36aca8354f5efa7fff6d23bd90e885bcf1ab Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 4ee323a5b19382dbd9ba62f5002042d0746c374e Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit 3d3e164489cb4bd2db342ae085da9613ee7de660 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit 8a4f586d7232a4d89977cef140900728d4517b72 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit 33dd5b0e0006882e735b7ea1908fdb6ad37c825a Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit f19de3e7aaf5151d5ce9c63a2b9ee393c6282dfa Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit e1f8cad15ddc2e385a3f2a778a4af57e1072987c Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 472b6b1ed2d5bc10ff1d6df8e435f33dc821ad4b Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 367c021bd50068baf024bea3afde4ed58aa38b81 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 0a466e16d983392cbf0580733500c0890521df93 Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 6feceda2c1d631243c78fd7805dcdde4d0e8912f Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit db1f731ee5aff4618edefed018e982f83add0c9a Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit c8a5e1129310ed1ce1fd86f43bb49da701140383 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit de53ceaeff08dc7c01962c704e06d7b87f804ec7 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit e372631e911f2e03cc4f579e291e1198c4c11298 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit a68962a Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0b02105 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '5e73e8b8a2408bd8193361788669ca80db19cb04' * Refactor ok_vqa_aggreate_submissions function * Merge commit '40099e8b8145bde513b9b7cef8461d8f13d1eafe' * Refactor VQA submission file saving * Update file utils * Merge commit 'a56fe11c00ad4a8b8967be88b93baef6649528c5' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit f4af7d0 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Fix small bugs in list_with_num * Revise list_with_num model args * Dev/readme rm rolling (EvolvingLMMs-Lab#60) * remove log_likelyhood_rolling * Update time efficiency benchmark in README.md * add task guide --------- Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove unnecessary code and update dependencies * Fix logging utils bug on wandb grouping * Add reproduce envs * Squashed commit of the following: commit cf18d7a1300311ffe1c9671fff7fa0c0d1cf2476 Merge: 2475639 f89a736 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:12:12 2024 +0800 Merge branch 'main' into kc/final_fix commit 35e5a937bcf924d6b787ce37c6da9f0f54674da9 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:11:04 2024 +0800 Add reproduce envs commit 3c741a5 Author: kcz358 <[email protected]> Date: Sun Mar 3 21:19:15 2024 +0800 [Fix] wandb group logging missing columns (EvolvingLMMs-Lab#61) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit a68962a Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0b02105 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '5e73e8b8a2408bd8193361788669ca80db19cb04' * Refactor ok_vqa_aggreate_submissions function * Merge commit '40099e8b8145bde513b9b7cef8461d8f13d1eafe' * Refactor VQA submission file saving * Update file utils * Merge commit 'a56fe11c00ad4a8b8967be88b93baef6649528c5' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit f4af7d0 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit 39ce670fb1992c5e30d4b0eff9636a88a1ce83f5 Merge: 83358a4 5e1c9c7 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:25:48 2024 +0000 Merge branch 'main' into kc/final_fix commit 36eeaa08730cd3e6a7e90e7000f61b4ebb075524 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:23:19 2024 +0000 Fix logging utils bug on wandb grouping commit 9eb42de Author: kcz358 <[email protected]> Date: Sun Mar 3 13:01:11 2024 +0800 [Fix] refcocog dataset path, record gpt prompt in internal eval, build context issue (EvolvingLMMs-Lab#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit a68962a Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0b02105 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '5e73e8b8a2408bd8193361788669ca80db19cb04' * Refactor ok_vqa_aggreate_submissions function * Merge commit '40099e8b8145bde513b9b7cef8461d8f13d1eafe' * Refactor VQA submission file saving * Update file utils * Merge commit 'a56fe11c00ad4a8b8967be88b93baef6649528c5' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit f4af7d0 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit 22fda28d8aa2a53405f15d179ea9baaf53a19b0b Author: kcz358 <[email protected]> Date: Sat Mar 2 05:58:08 2024 +0000 Revise list_with_num model args commit 48d92eb823b7929ea4c7b0da9f2284ec194c71cf Author: kcz358 <[email protected]> Date: Sat Mar 2 05:09:15 2024 +0000 Fix small bugs in list_with_num commit 1cf38b3ad6c7799957901d836299243cc21718f5 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 62527c874431508b7731ad49ff1f1526104703cd Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 522f36aca8354f5efa7fff6d23bd90e885bcf1ab Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 4ee323a5b19382dbd9ba62f5002042d0746c374e Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit 3d3e164489cb4bd2db342ae085da9613ee7de660 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit 8a4f586d7232a4d89977cef140900728d4517b72 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit 33dd5b0e0006882e735b7ea1908fdb6ad37c825a Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit f19de3e7aaf5151d5ce9c63a2b9ee393c6282dfa Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit e1f8cad15ddc2e385a3f2a778a4af57e1072987c Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 472b6b1ed2d5bc10ff1d6df8e435f33dc821ad4b Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 367c021bd50068baf024bea3afde4ed58aa38b81 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 0a466e16d983392cbf0580733500c0890521df93 Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 6feceda2c1d631243c78fd7805dcdde4d0e8912f Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit db1f731ee5aff4618edefed018e982f83add0c9a Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit c8a5e1129310ed1ce1fd86f43bb49da701140383 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit de53ceaeff08dc7c01962c704e06d7b87f804ec7 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit e372631e911f2e03cc4f579e291e1198c4c11298 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit a68962a Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 0b02105 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit '5e73e8b8a2408bd8193361788669ca80db19cb04' * Refactor ok_vqa_aggreate_submissions function * Merge commit '40099e8b8145bde513b9b7cef8461d8f13d1eafe' * Refactor VQA submission file saving * Update file utils * Merge commit 'a56fe11c00ad4a8b8967be88b93baef6649528c5' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit f4af7d0 Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update commands.md * Add repr_scripts for reference * Add timeout for gpt4V * Remove unnecessary dependencies * Add reproduce into readme * Revise seedbench process_result * Fix exclude dc hardcode postprocess logic error * Fix metric repeat issue * Update dataset runtime and add environment info * Revise val submission file saving path * Put the correct query into the gpt extraction * Update sleep time in utils.py * update --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
* Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 2782eb0 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 7e8d3e4 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 4fa73ba Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Add timeout to API requests * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix error logging in get_chat_response function * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Squashed commit of the following: commit e873012d0da2711f2076f7c09f390901f89da2f9 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 621cdd663e0197827a5792872f13cdf3d27d2813 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 6daf75c54fe3d45970c5d35a10000f10c1420c6b Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 2a7a03205a2514fe0322ab4aa05c4948f9233109 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit a99850057224596d01835fface39d4aafd79de3e Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit 42f5fc125c7ee7d31633647f29f0d02ed3e640a8 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit dddd0276003115c8a150a78eb3ae7bd299c460e4 Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit bcffe0b45083f48886e18d5ece5f2504b96bbcbd Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit f6705996b992363f2fd3c5dedb90e1bd51d04426 Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 9290fc1c27ecca86f7ec3df0d932c7fa228e19c9 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 2fceaaf8f855d08d642996cd217ec0f6fc0fa04c Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 33c0a81c91733e9aabe214f0797be2fdd3df1f1c Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 90ad0ace136a35ecc16a09ce841736842f7eb6dd Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit 15b0336a932ef1823696e63672837700ce4fdae9 Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit f75e7cfd35b1ee814f86abb9d4fbace027c00941 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 06c51ea7682e31964ca720a8a40705a3a7f3f360 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit cdf7e6f77f7b6eee960e01e80c00ec74b8c1fbe7 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit 2782eb0 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 7e8d3e4 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 4fa73ba Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Fix small bugs in list_with_num * Revise list_with_num model args * Dev/readme rm rolling (EvolvingLMMs-Lab#60) * remove log_likelyhood_rolling * Update time efficiency benchmark in README.md * add task guide --------- Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]> * Remove unnecessary code and update dependencies * Fix logging utils bug on wandb grouping * Add reproduce envs * Squashed commit of the following: commit bf49a3e1de8431193bdf6f7688a4ff7f4683a84d Merge: 2475639 f89a736 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:12:12 2024 +0800 Merge branch 'main' into kc/final_fix commit b535df91bc792b3b2b296572ec4692c75fdfe878 Author: kcz358 <[email protected]> Date: Sun Mar 3 22:11:04 2024 +0800 Add reproduce envs commit d0539a0 Author: kcz358 <[email protected]> Date: Sun Mar 3 21:19:15 2024 +0800 [Fix] wandb group logging missing columns (EvolvingLMMs-Lab#61) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 2782eb0 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 7e8d3e4 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 4fa73ba Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit 7dc049915a1846177e0f9f8eab12366881f82157 Merge: 83358a4 5e1c9c7 Author: kcz358 <[email protected]> Date: Sun Mar 3 07:25:48 2024 +0000 Merge branch 'main' into kc/final_fix commit 5ec98efc7b666341adc726b8d1d4779b6c543f7f Author: kcz358 <[email protected]> Date: Sun Mar 3 07:23:19 2024 +0000 Fix logging utils bug on wandb grouping commit 105d781 Author: kcz358 <[email protected]> Date: Sun Mar 3 13:01:11 2024 +0800 [Fix] refcocog dataset path, record gpt prompt in internal eval, build context issue (EvolvingLMMs-Lab#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 2782eb0 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 7e8d3e4 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 4fa73ba Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <[email protected]> Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: jzhang38 <[email protected]> commit 8263ca91c87a127d992dd01bdac5f89b8a5ff521 Author: kcz358 <[email protected]> Date: Sat Mar 2 05:58:08 2024 +0000 Revise list_with_num model args commit c413569d46be0ad604cd249df8bd58ffe26c0e39 Author: kcz358 <[email protected]> Date: Sat Mar 2 05:09:15 2024 +0000 Fix small bugs in list_with_num commit e873012d0da2711f2076f7c09f390901f89da2f9 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 621cdd663e0197827a5792872f13cdf3d27d2813 Merge: a3cae8e ffb9eb2 Author: kcz358 <[email protected]> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 6daf75c54fe3d45970c5d35a10000f10c1420c6b Author: kcz358 <[email protected]> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 2a7a03205a2514fe0322ab4aa05c4948f9233109 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit a99850057224596d01835fface39d4aafd79de3e Author: kcz358 <[email protected]> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit 42f5fc125c7ee7d31633647f29f0d02ed3e640a8 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit dddd0276003115c8a150a78eb3ae7bd299c460e4 Merge: 786f2b5 1700786 Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit bcffe0b45083f48886e18d5ece5f2504b96bbcbd Author: kcz358 <[email protected]> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit f6705996b992363f2fd3c5dedb90e1bd51d04426 Merge: 4240785 888c1c1 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 9290fc1c27ecca86f7ec3df0d932c7fa228e19c9 Author: kcz358 <[email protected]> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 2fceaaf8f855d08d642996cd217ec0f6fc0fa04c Author: kcz358 <[email protected]> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 33c0a81c91733e9aabe214f0797be2fdd3df1f1c Author: kcz358 <[email protected]> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 90ad0ace136a35ecc16a09ce841736842f7eb6dd Author: kcz358 <[email protected]> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit 15b0336a932ef1823696e63672837700ce4fdae9 Author: kcz358 <[email protected]> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit f75e7cfd35b1ee814f86abb9d4fbace027c00941 Merge: 9cf86fa 93534dc Author: kcz358 <[email protected]> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 06c51ea7682e31964ca720a8a40705a3a7f3f360 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit cdf7e6f77f7b6eee960e01e80c00ec74b8c1fbe7 Author: kcz358 <[email protected]> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit 2782eb0 Author: Zhang Peiyuan <[email protected]> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 7e8d3e4 Author: Pu Fanyi <[email protected]> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <[email protected]> Co-authored-by: kcz358 <[email protected]> commit 4fa73ba Author: Li Bo <[email protected]> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update commands.md * Add repr_scripts for reference * Add timeout for gpt4V * Remove unnecessary dependencies * Add reproduce into readme * Revise seedbench process_result * Fix exclude dc hardcode postprocess logic error * Fix metric repeat issue * Update dataset runtime and add environment info * Revise val submission file saving path * Put the correct query into the gpt extraction * Update sleep time in utils.py * update --------- Co-authored-by: Fanyi Pu <[email protected]> Co-authored-by: kcz358 <[email protected]> Co-authored-by: jzhang38 <[email protected]> Co-authored-by: kcz358 <[email protected]>
kangreen0210
pushed a commit
to kangreen0210/LIME
that referenced
this pull request
Oct 6, 2024
Fix types to allow nullables in `llava_hf.py`
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A small fix to allow users to disable options like FA2 in the command line (previously
None
was cast tostr
)