Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 Reproduce the results successfully, avg score 56.4 #44

Open
JunnYu opened this issue Feb 25, 2025 · 6 comments
Open

🎉 Reproduce the results successfully, avg score 56.4 #44

JunnYu opened this issue Feb 25, 2025 · 6 comments

Comments

@JunnYu
Copy link

JunnYu commented Feb 25, 2025

I have reproduced the results in the repo on an 8x80G A800, achieving an average score of 56.4. Here is the huggingface hub's model: https://huggingface.co/junnyu/DeepScaleR-1.5B-Preview-Reproduce

Thanks for your generous contribution of open-source code to the repository. Your efforts have made it possible for others like me to learn, experiment, and build upon your work.

Training (8*A800-80G)

repo commit id: 0dbb438

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export VLLM_ATTENTION_BACKEND=XFORMERS

# Run 8K context length training, 560 steps
export MODEL_PATH="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
nohup bash run_deepscaler_1.5b_8k.sh --model $MODEL_PATH > stage1.log 2>&1 &

# Run 16K context length training, 250 steps
export MODEL_PATH="./checkpoints/deepscaler/deepscaler-1.5b-8k/actor/global_step_560"
nohup bash run_deepscaler_1.5b_16k.sh --model $MODEL_PATH > stage2.log 2>&1 &

# Run 24K context length training, 190 steps
export MODEL_PATH="./checkpoints/deepscaler/deepscaler-1.5b-16k/actor/global_step_250"
nohup bash run_deepscaler_1.5b_24k.sh --model $MODEL_PATH > stage3.log 2>&1 &

# Run 24K context length training, 480 steps
export MODEL_PATH="./checkpoints/deepscaler/deepscaler-1.5b-24k/actor/global_step_190"
nohup bash run_deepscaler_1.5b_24k.sh --model $MODEL_PATH > stage3-continue.log 2>&1 &

Evaluation

Model AIME 2024 MATH 500 AMC 2023 Minerva Math OlympiadBench Avg.
Qwen-2.5-7B-Instruct 13.3 79.8 50.6 34.6 40.7 43.8
rStar-Math-7B 26.7 78.4 47.5 - 47.1 -
Eurus-2-7B-PRIME 26.7 79.2 57.8 38.6 42.1 48.9
Qwen2.5-7B-SimpleRL 26.7 82.4 62.5 39.7 43.3 50.9
DeepSeek-R1-Distill-Qwen-1.5B 28.8 82.8 62.9 26.5 43.3 48.9
Still-1.5B 32.5 84.4 66.7 29.0 45.4 51.6
DeepScaleR-1.5B-Preview 43.1 87.8 73.6 30.2 50.0 57.0
🎉 DeepScaleR-1.5B-Preview-Reproduce 40.4 87.9 72.0 31.5 50.2 56.4
O1-Preview 40.0 81.4 - - - -

Wandb Log

Image
Image
Image

@QuyAnh2005
Copy link

@JunnYu Congratulations!! If you don't mind, can I ask a question?

How many hours did the whole process take you, or 1 epoch?

@jumptoliujj
Copy link

Could you share the trend of the val test score during training?

@lambda7xx
Copy link

lambda7xx commented Feb 25, 2025

that's awesome. Thanks for your great work.

@JunnYu
Copy link
Author

JunnYu commented Feb 25, 2025

@jumptoliujj

Image

Image

Image

@JunnYu
Copy link
Author

JunnYu commented Feb 25, 2025

@QuyAnh2005 8K 2d 22h 22m 5s ,16K 1d 23h 45m 35s, 24K 3d 23h 6m 9s

@jumptoliujj
Copy link

jumptoliujj commented Feb 25, 2025

@jumptoliujj

Image

Image

Image

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants