-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
关于多机多卡效果不如单机多卡好的问题 #111
Comments
@DePengW |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
你好,我在sft阶段训练llama-7b版本时候发现个问题,训练超参数保持一致(lr、step、weight_decay、warmup等)
设置1:当使用8 x a100,per_device_train_batch_size=16,gradient_accumulation_steps=2,totol_batch_size = 128
设置2:当使用2 x 8 x a100,per_device_train_batch_size=16,gradient_accumulation_steps=1,totol_batch_size = 128
设置1的各个指标都比设置2好,请问这个问题您们有关注过么?
我使用gemma-2b也会出现同样的情况,这可能是关于多机多卡和单机多卡性能的问题,您们之前有注意过么?以及有什么解决方案么
The text was updated successfully, but these errors were encountered: