-
Notifications
You must be signed in to change notification settings - Fork 31
Questions about Dataset and Experiments #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
MATH = training set 7.5K + test set 5 K. However, in practice, researchers often only use a fixed subset consisting of 500 samples in the test set to measure the mathematical reasoning ability of LLM. So SimpleRL project uses 7.5K + 4.5K (from test set) to train the model. They also filter out the easy prompt so they have 8523 eventually. |
Thank you very much for your reply! Could you please release your SFT training code and yaml. Thank you very much for your help |
In addition, I noticed that you mentioned, 'We generate 15K long CoT data using QwQ-32B on the MATH training set.' However, the MATH training set contains only 7.5K problems. Did you generate multiple CoT per problem? |
Hi, we use this dataset: RLHFlow/qwq_gen_sft_15k for the SFT training. The config is https://wandb.ai/axolotl-ai/qwen-im-end/runs/uz5v11h9/files/tmp/axolotl_config_g2g7wwya.yml but we use a learning rate of 1e-5, a global batch size of 32, and train for 1 epoch. |
Yes, there are multiple responses per prompt. |
Thank you for your reply. Did you generate two cots per prompt as I noticed that 7.5K*2=15K? |
Actually we do not generate the data by ourselves but use a randomly selected subset of justus27/qwq_cot_sampled_math |
Thanks! You mentioned that 'For SFT/RAFT, we use 8xA100 40G. For DPO and PPO, we use 8xA100/H100 80G.' Could you please tell me how many hours the training takes for SFT, PPO, and DPO? |
Thank you for your work.
I noticed that the MATH dataset you used contains 7.5K questions. However, the Qwen2.5-7B-SimpleRL model uses 8,523 questions. Could you please clarify the differences?
The text was updated successfully, but these errors were encountered: