Skip to content

[benchmark][MOAT] Shuffle choices for MOAT #852

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 22, 2025

Conversation

waltsun
Copy link
Contributor

@waltsun waltsun commented Mar 13, 2025

Sorry to bother again.

I check the code for MOAT and find some minor differences from our original settings.

  1. shuffle the choices if given
  2. for outside knowledge text, LMMs should be told this is a hint

Theoretically, these should not cause significant performance changes. But setting so will make the implementation more consistent. We conduct a test ourselves with GPT4o_HIGH and the result is similar to ours in the paper.
(It is reasonable for results changing minorly every time, since the shuffle and the uncertainty of model itself)

Sorry again and thank you for your help.

@kennymckormick
Copy link
Member

Hi, @waltsun ,
Is it possible to set a fixed random seed to improve the reproducibility?

@waltsun
Copy link
Contributor Author

waltsun commented Mar 20, 2025

Hi, @waltsun , Is it possible to set a fixed random seed to improve the reproducibility?

Sure, thank you for your advice.
I made a commit where set a seed zero in the init function of the class, and ran GPT4o_HIGH settings for three times.
The results are 32.28, 31.74 and 31.22. The variation did drop a lot.
These results might be still different from reported ones in our paper, since our own implementation didn't set a seed. We believe the gap is reasonable.
(Actually setting seed in our original implementation will not work, since we build prompt concurrently, which makes the order of questions uncertain.)

Thank you again for your advice. This makes our work much better in reproducibility.

@kennymckormick kennymckormick merged commit 91e1b80 into open-compass:main Mar 22, 2025
7 checks passed
Mercury7353 pushed a commit to Mercury7353/VLMEvalKit that referenced this pull request Apr 28, 2025
* [benchmark][MOAT] Shuffle choices for MOAT

* [benchmark] [MOAT] set random seed for choice shuffle
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants