🎉 Reproduce the results successfully, avg score 56.4 #44

JunnYu · 2025-02-25T10:19:12Z

I have reproduced the results in the repo on an 8x80G A800, achieving an average score of 56.4. Here is the huggingface hub's model: https://huggingface.co/junnyu/DeepScaleR-1.5B-Preview-Reproduce

Thanks for your generous contribution of open-source code to the repository. Your efforts have made it possible for others like me to learn, experiment, and build upon your work.

Training (8*A800-80G)

repo commit id: 0dbb438

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export VLLM_ATTENTION_BACKEND=XFORMERS

# Run 8K context length training, 560 steps
export MODEL_PATH="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
nohup bash run_deepscaler_1.5b_8k.sh --model $MODEL_PATH > stage1.log 2>&1 &

# Run 16K context length training, 250 steps
export MODEL_PATH="./checkpoints/deepscaler/deepscaler-1.5b-8k/actor/global_step_560"
nohup bash run_deepscaler_1.5b_16k.sh --model $MODEL_PATH > stage2.log 2>&1 &

# Run 24K context length training, 190 steps
export MODEL_PATH="./checkpoints/deepscaler/deepscaler-1.5b-16k/actor/global_step_250"
nohup bash run_deepscaler_1.5b_24k.sh --model $MODEL_PATH > stage3.log 2>&1 &

# Run 24K context length training, 480 steps
export MODEL_PATH="./checkpoints/deepscaler/deepscaler-1.5b-24k/actor/global_step_190"
nohup bash run_deepscaler_1.5b_24k.sh --model $MODEL_PATH > stage3-continue.log 2>&1 &

Evaluation

Model	AIME 2024	MATH 500	AMC 2023	Minerva Math	OlympiadBench	Avg.
Qwen-2.5-7B-Instruct	13.3	79.8	50.6	34.6	40.7	43.8
rStar-Math-7B	26.7	78.4	47.5	-	47.1	-
Eurus-2-7B-PRIME	26.7	79.2	57.8	38.6	42.1	48.9
Qwen2.5-7B-SimpleRL	26.7	82.4	62.5	39.7	43.3	50.9
DeepSeek-R1-Distill-Qwen-1.5B	28.8	82.8	62.9	26.5	43.3	48.9
Still-1.5B	32.5	84.4	66.7	29.0	45.4	51.6
DeepScaleR-1.5B-Preview	43.1	87.8	73.6	30.2	50.0	57.0
🎉 DeepScaleR-1.5B-Preview-Reproduce	40.4	87.9	72.0	31.5	50.2	56.4
O1-Preview	40.0	81.4	-	-	-	-

Wandb Log

The text was updated successfully, but these errors were encountered:

QuyAnh2005 · 2025-02-25T10:51:48Z

@JunnYu Congratulations!! If you don't mind, can I ask a question?

How many hours did the whole process take you, or 1 epoch?

jumptoliujj · 2025-02-25T10:52:58Z

Could you share the trend of the val test score during training?

lambda7xx · 2025-02-25T11:03:58Z

that's awesome. Thanks for your great work.

JunnYu · 2025-02-25T12:34:27Z

@jumptoliujj

JunnYu · 2025-02-25T12:36:34Z

@QuyAnh2005 8K 2d 22h 22m 5s ，16K 1d 23h 45m 35s, 24K 3d 23h 6m 9s

jumptoliujj · 2025-02-25T12:39:34Z

@jumptoliujj

thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🎉 Reproduce the results successfully, avg score 56.4 #44

🎉 Reproduce the results successfully, avg score 56.4 #44

JunnYu commented Feb 25, 2025 •

edited

Loading

QuyAnh2005 commented Feb 25, 2025

jumptoliujj commented Feb 25, 2025

lambda7xx commented Feb 25, 2025 •

edited

Loading

JunnYu commented Feb 25, 2025

JunnYu commented Feb 25, 2025

jumptoliujj commented Feb 25, 2025 •

edited

Loading

🎉 Reproduce the results successfully, avg score 56.4 #44

🎉 Reproduce the results successfully, avg score 56.4 #44

Comments

JunnYu commented Feb 25, 2025 • edited Loading

Training (8*A800-80G)

Evaluation

Wandb Log

QuyAnh2005 commented Feb 25, 2025

jumptoliujj commented Feb 25, 2025

lambda7xx commented Feb 25, 2025 • edited Loading

JunnYu commented Feb 25, 2025

JunnYu commented Feb 25, 2025

jumptoliujj commented Feb 25, 2025 • edited Loading

JunnYu commented Feb 25, 2025 •

edited

Loading

lambda7xx commented Feb 25, 2025 •

edited

Loading

jumptoliujj commented Feb 25, 2025 •

edited

Loading