Skip to content

Questions about Dataset and Experiments #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
LLMforScience opened this issue Mar 9, 2025 · 8 comments
Open

Questions about Dataset and Experiments #8

LLMforScience opened this issue Mar 9, 2025 · 8 comments

Comments

@LLMforScience
Copy link

Thank you for your work.

I noticed that the MATH dataset you used contains 7.5K questions. However, the Qwen2.5-7B-SimpleRL model uses 8,523 questions. Could you please clarify the differences?

@WeiXiongUST
Copy link
Contributor

MATH = training set 7.5K + test set 5 K. However, in practice, researchers often only use a fixed subset consisting of 500 samples in the test set to measure the mathematical reasoning ability of LLM. So SimpleRL project uses 7.5K + 4.5K (from test set) to train the model. They also filter out the easy prompt so they have 8523 eventually.

@LLMforScience
Copy link
Author

LLMforScience commented Mar 14, 2025

Thank you very much for your reply! Could you please release your SFT training code and yaml. Thank you very much for your help

@LLMforScience
Copy link
Author

In addition, I noticed that you mentioned, 'We generate 15K long CoT data using QwQ-32B on the MATH training set.' However, the MATH training set contains only 7.5K problems. Did you generate multiple CoT per problem?

@LLMforScience LLMforScience changed the title Dataset Mismatch Questions about Dataset and Experiments Mar 14, 2025
@WeiXiongUST
Copy link
Contributor

Hi, we use this dataset: RLHFlow/qwq_gen_sft_15k for the SFT training. The config is

https://wandb.ai/axolotl-ai/qwen-im-end/runs/uz5v11h9/files/tmp/axolotl_config_g2g7wwya.yml

but we use a learning rate of 1e-5, a global batch size of 32, and train for 1 epoch.

@WeiXiongUST
Copy link
Contributor

Yes, there are multiple responses per prompt.

@LLMforScience
Copy link
Author

LLMforScience commented Mar 14, 2025

Thank you for your reply. Did you generate two cots per prompt as I noticed that 7.5K*2=15K?

@WeiXiongUST
Copy link
Contributor

Actually we do not generate the data by ourselves but use a randomly selected subset of justus27/qwq_cot_sampled_math

@LLMforScience
Copy link
Author

LLMforScience commented Mar 21, 2025

Thanks! You mentioned that 'For SFT/RAFT, we use 8xA100 40G. For DPO and PPO, we use 8xA100/H100 80G.' Could you please tell me how many hours the training takes for SFT, PPO, and DPO?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants