Questions about Dataset and Experiments #8

LLMforScience · 2025-03-09T16:06:39Z

Thank you for your work.

I noticed that the MATH dataset you used contains 7.5K questions. However, the Qwen2.5-7B-SimpleRL model uses 8,523 questions. Could you please clarify the differences?

WeiXiongUST · 2025-03-09T18:21:12Z

MATH = training set 7.5K + test set 5 K. However, in practice, researchers often only use a fixed subset consisting of 500 samples in the test set to measure the mathematical reasoning ability of LLM. So SimpleRL project uses 7.5K + 4.5K (from test set) to train the model. They also filter out the easy prompt so they have 8523 eventually.

LLMforScience · 2025-03-14T15:13:17Z

Thank you very much for your reply! Could you please release your SFT training code and yaml. Thank you very much for your help

LLMforScience · 2025-03-14T16:49:37Z

In addition, I noticed that you mentioned, 'We generate 15K long CoT data using QwQ-32B on the MATH training set.' However, the MATH training set contains only 7.5K problems. Did you generate multiple CoT per problem?

WeiXiongUST · 2025-03-14T16:53:07Z

Hi, we use this dataset: RLHFlow/qwq_gen_sft_15k for the SFT training. The config is

https://wandb.ai/axolotl-ai/qwen-im-end/runs/uz5v11h9/files/tmp/axolotl_config_g2g7wwya.yml

but we use a learning rate of 1e-5, a global batch size of 32, and train for 1 epoch.

WeiXiongUST · 2025-03-14T16:53:30Z

Yes, there are multiple responses per prompt.

LLMforScience · 2025-03-14T16:55:45Z

Thank you for your reply. Did you generate two cots per prompt as I noticed that 7.5K*2=15K?

WeiXiongUST · 2025-03-14T18:38:13Z

Actually we do not generate the data by ourselves but use a randomly selected subset of justus27/qwq_cot_sampled_math

LLMforScience · 2025-03-21T03:40:03Z

Thanks! You mentioned that 'For SFT/RAFT, we use 8xA100 40G. For DPO and PPO, we use 8xA100/H100 80G.' Could you please tell me how many hours the training takes for SFT, PPO, and DPO?

LLMforScience changed the title ~~Dataset Mismatch~~ Questions about Dataset and Experiments Mar 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about Dataset and Experiments #8

Questions about Dataset and Experiments #8

LLMforScience commented Mar 9, 2025

WeiXiongUST commented Mar 9, 2025

LLMforScience commented Mar 14, 2025 •

edited

Loading

LLMforScience commented Mar 14, 2025

WeiXiongUST commented Mar 14, 2025

WeiXiongUST commented Mar 14, 2025

LLMforScience commented Mar 14, 2025 •

edited

Loading

WeiXiongUST commented Mar 14, 2025

LLMforScience commented Mar 21, 2025 •

edited

Loading

Questions about Dataset and Experiments #8

Questions about Dataset and Experiments #8

Comments

LLMforScience commented Mar 9, 2025

WeiXiongUST commented Mar 9, 2025

LLMforScience commented Mar 14, 2025 • edited Loading

LLMforScience commented Mar 14, 2025

WeiXiongUST commented Mar 14, 2025

WeiXiongUST commented Mar 14, 2025

LLMforScience commented Mar 14, 2025 • edited Loading

WeiXiongUST commented Mar 14, 2025

LLMforScience commented Mar 21, 2025 • edited Loading

LLMforScience commented Mar 14, 2025 •

edited

Loading

LLMforScience commented Mar 14, 2025 •

edited

Loading

LLMforScience commented Mar 21, 2025 •

edited

Loading