-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproduce deepscaler #33
Comments
|
@michaelzhiluo What GPU are you using? And how should I adjust the parameters if I only have 8 GPUs with 40G? I found that simply changing data.train_batch_size from 256 to 128 is OOM. it works fine with data.train_batch_size=16. Why is that? |
You should set train batch size to be much larger, it will only OOM if your microbatch size is too large; train batch size shouldn't matter. Recommend batch size of at least 128-256 and 8-16 samples per problems. That way you can get meaningful gradients, not super noisy ones as is a common problem in RL. |
I set all *_micro_batch_size to 2. It will still OOM. |
Question: Have you fixed the bug yet? |
A100-80GB GPUs, seems to be the minimum standard these days (as everyone is using H100s now...). Train batch size shouldn't affect OOM, it is either most likely mini batch size or micro batch size. Try finding that? i.e. set micro batch size to 1. |
This was already fixed. We fixed it after realizing the 8k run has such high KL loss; also Verl has also discovered the same bug here: volcengine/verl@a65c915 |
Can you please release the 8K wandb log? I find the same phenomenon as [merlinarer] |
My training log is similar to [merlinarer], too. I use 8xA100 80G to do experiments. |
Hello, I try to reproduce deepscaler on my 8*A80G machine without any change to the code, but found a different trend (wandb shown below):
This, in my understand, indicates that 8k is still enough as clip_ratio dont increase, maybe 8k should contiue for more steps ?
The text was updated successfully, but these errors were encountered: