-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MMLU 测评结果与官方差距较大 #267
Comments
一般来讲,各个评测框架,如OpenCompass、harness等,或者官方的评测逻辑,因为在prompt构造(包括如果有few-shot的话,采样逻辑和各模型对few-shot的following能力)、模型推理参数、结果解析逻辑都有一些差异,最终就会导致评测结果有差异; 通常建议尽量采用一个框架来横向评测各个模型。 |
是不是有可能官方测评是用的micro_avg? |
我们最终也是类似micro avg的计算方式,后续会支持更多评测指标 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The text was updated successfully, but these errors were encountered: