-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flash_attn版本讨论 #148
Comments
没有测过不同版本的flash-attn欸,不过我觉得这个影响不大,直接测开源的model结果是否接近呢?训练数据和超参有没有区别嘞?其他benchmark的结果差的多吗? |
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
我们复现的模型性能与发布模型性能相差4-5pp(mvbench),考虑是否有flash_attn版本不一致的原因。
发布版本的flash_attn==1.0.4,我们机器安装flash_attn==1.0.4报错,但是可以顺利安装flash_attn==2.4.2。由于flash_attn==2.4.2对于flash_attn==1.0.4是完全重构,想了解一下flash_attn升级版本是否对模型性能产生影响,贵团队是否利用flash_attn==2.4.2训练并测试过模型性能。
The text was updated successfully, but these errors were encountered: