-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
请问chatglm with lora 什么时候支持多卡fine tune啊 #15
Comments
把模型和数据放置到不同的device上就可以并行了,你可以参考这个实现:https://github.com/yuanzhoulvpi2017/zero_nlp/tree/main/Chatglm6b_ModelParallel |
我的卡是半精度比单精度快很多的型号,用fp16=true似乎训练速度没有提升,是需要增加其它参数吗? training_chatglm_csc_demo.py: 102 |
我还在解决这个问题,fp16训练当前只减少显存占用了,没有起到加速作用。 |
赞。我也试了折腾int8,改了一点之后,还是卡在
下面是一些小改动,改完才可以进到上面的错误中
|
看到transformer文档里面似乎也是表示fp16可能会在大batch size时省显存,要加速对模型有苛刻要求。 另外,我试着把batch size换成4能提速20%左右(V100有32G内存),换成8还能再提升5%但是训练无效果。 |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.(由于长期不活动,机器人自动关闭此问题,如果需要欢迎提问) |
No description provided.
The text was updated successfully, but these errors were encountered: