[benchmark][MOAT] Shuffle choices for MOAT #852

waltsun · 2025-03-13T12:00:42Z

Sorry to bother again.

I check the code for MOAT and find some minor differences from our original settings.

shuffle the choices if given
for outside knowledge text, LMMs should be told this is a hint

Theoretically, these should not cause significant performance changes. But setting so will make the implementation more consistent. We conduct a test ourselves with GPT4o_HIGH and the result is similar to ours in the paper.
(It is reasonable for results changing minorly every time, since the shuffle and the uncertainty of model itself)

Sorry again and thank you for your help.

kennymckormick · 2025-03-20T09:52:47Z

Hi, @waltsun ,
Is it possible to set a fixed random seed to improve the reproducibility?

waltsun · 2025-03-20T10:40:38Z

Hi, @waltsun , Is it possible to set a fixed random seed to improve the reproducibility?

Sure, thank you for your advice.
I made a commit where set a seed zero in the init function of the class, and ran GPT4o_HIGH settings for three times.
The results are 32.28, 31.74 and 31.22. The variation did drop a lot.
These results might be still different from reported ones in our paper, since our own implementation didn't set a seed. We believe the gap is reasonable.
(Actually setting seed in our original implementation will not work, since we build prompt concurrently, which makes the order of questions uncertain.)

Thank you again for your advice. This makes our work much better in reproducibility.

* [benchmark][MOAT] Shuffle choices for MOAT * [benchmark] [MOAT] set random seed for choice shuffle

[benchmark][MOAT] Shuffle choices for MOAT

59545f9

[benchmark] [MOAT] set random seed for choice shuffle

1d61f57

kennymckormick merged commit 91e1b80 into open-compass:main Mar 22, 2025
7 checks passed

Mercury7353 pushed a commit to Mercury7353/VLMEvalKit that referenced this pull request Apr 28, 2025

[benchmark][MOAT] Shuffle choices for MOAT (open-compass#852)

244b37e

* [benchmark][MOAT] Shuffle choices for MOAT * [benchmark] [MOAT] set random seed for choice shuffle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[benchmark][MOAT] Shuffle choices for MOAT #852

[benchmark][MOAT] Shuffle choices for MOAT #852

Uh oh!

waltsun commented Mar 13, 2025

Uh oh!

kennymckormick commented Mar 20, 2025

Uh oh!

waltsun commented Mar 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

[benchmark][MOAT] Shuffle choices for MOAT #852

[benchmark][MOAT] Shuffle choices for MOAT #852

Uh oh!

Conversation

waltsun commented Mar 13, 2025

Uh oh!

kennymckormick commented Mar 20, 2025

Uh oh!

waltsun commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

waltsun commented Mar 20, 2025 •

edited

Loading