Awesome-deep-reasoning

Collect the awesome works evolved around reasoning models like O1/R1! You can also find the collection here.

News

OpenAI publishes a deep-research capability.
OpenAI has launched the latest o3 model: o3-mini & o3-mini-high, which specifically support science, math and coding. These two models are available in ChatGPT App, Poe, etc.
NVIDIA-NIM has supported the DeepSeek-R1 model.
Qwen has launched a powerful multi-modal MoE model: Qwen2.5-Max, this model is available in the Bailian platform.
CodeGPT: VSCode co-pilot now supports R1.

Highlights

DeepSeek repos:

DeepSeek-R1 - DeepSeek-R1 official repository.

Qwen repos:

Qwen-QwQ - Qwen 2.5 official repository, with QwQ.

S1 from stanford - From Feifei Li team, a distillation and test-time compute impl which can match the performance of O1 and R1.

Papers

DeepSeek-R1-Tech-Report
DeepSeek-V3 Tech-Report
Qwen QwQ Technical blog - QwQ: Reflect Deeply on the Boundaries of the Unknown
OpenAI-o1 Announcement - Learning to Reason with Large Language Models
Qwen-math-PRM-Tech-Report(MCTS/PRM)
Qwen2.5 Tech-Report
DeepSeek Math Tech-Report(GRPO)
Kimi K1.5 Tech-Report
Qwen-Math-PRM - The Lessons of Developing Process Reward Models in Mathematical Reasoning
Large Language Models for Mathematical Reasoning: Progresses and Challenges (EACL 2024)
Large Language Models Cannot Self-Correct Reasoning Yet (ICLR 2024)
AT WHICH TRAINING STAGE DOES CODE DATA HELP LLM REASONING? (ICLR 2024)
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought [ code ]
LlamaV-o1 - Rethinking Step-by-step Visual Reasoning in LLMs
rStar-Math - Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
MathScale - Scaling Instruction Tuning for Mathematical Reasoning
LLMS CAN PLAN ONLY IF WE TELL THEM - A new CoT method: AoT+
SFT Memorizes, RL Generalizes - A research from DeepMind shows the effect of SFT and RL.
Frontier AI systems have surpassed the self-replicating red line - A paper from Fudan university indicates that LLM has surpassed the self-replicating red line.
LIMO - Less is More for Reasoning: Use 817 samples to train a model that surpasses the o1 level models.
Underthinking of Reasoning models - Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Competitive Programming with Large Reasoning Models - OpenAI: Competitive Programming with Large Reasoning Models
Think Less, Achieve More: Cut Reasoning Costs by 50% Without Sacrificing Accuracy - Sky-T1-32B-Flash, reasoning language model that significantly reduces overthinking
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
OverThink: Slowdown Attacks on Reasoning LLMs
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention - (DeepSeek) NSA: A natively trainable Sparse Attention mechanism that integrates algorithmic innovations with hardware-aligned optimizations to achieve efficient long-context modeling.

Models

DeepSeek series:

Model ID	ModelScope	Hugging Face
DeepSeek R1	Model Link	Model Link
DeepSeek V3	Model Link	Model Link
DeepSeek-R1-Distill-Qwen-32B	Model Link	Model Link
DeepSeek-R1-Distill-Qwen-14B	Model Link	Model Link
DeepSeek-R1-Distill-Llama-8B	Model Link	Model Link
DeepSeek-R1-Distill-Qwen-7B	Model Link	Model Link
DeepSeek-R1-Distill-Qwen-1.5B	Model Link	Model Link
DeepSeek-R1-GGUF	Model Link	Model Link
DeepSeek-R1-Distill-Qwen-32B-GGUF	Model Link	Model Link
DeepSeek-R1-Distill-Llama-8B-GGUF	Model Link	Model Link

Qwen series:

Model ID	ModelScope	Hugging Face
QwQ-32B-Preview	Model Link	Model Link
QVQ-72B-Preview	Model Link	Model Link
QwQ-32B-Preview-GGUF	Model Link	Model Link
QVQ-72B-Preview-bnb-4bit	Model Link	Model Link

Others:

Model ID	ModelScope	Hugging Face
Qwen2-VL-2B-GRPO-8k	-	Model Link

Infra

Open R1 by Hugging Face: https://github.com/huggingface/open-r1
- This repo is the official repo of Hugging Face to reproduce the training infra of DeepSeek-R1
TinyZero: https://github.com/Jiayi-Pan/TinyZero
- Clean, minimal, accessible reproduction of DeepSeek R1-Zero
SimpleRL-Reason: https://github.com/hkust-nlp/simpleRL-reason
- Use OpenRLHF to reproduce DeepSeek-R1
Ragen: https://github.com/ZihanWang314/RAGEN
- A General-Purpose Reasoning Agent Training Framework and reproduce DeepSeek-R1
TRL: https://github.com/huggingface/trl
- Hugging Face official training framework which supports open-source GRPO and other RL algorithms.
OpenRLHF: https://github.com/OpenRLHF/OpenRLHF
- An RL repo which supports RLs(supports REINFORCE++)
Logic-RL: https://github.com/Unakar/Logic-RL
Align-Anything: https://github.com/PKU-Alignment/align-anything
- Training All-modality Model with Feedback
R-Chain: A lightweight toolkit for distilling reasoning models
- https://github.com/modelscope/r-chain
Math Verify: A robust mathematical expression evaluation system designed for assessing Large Language Model outputs in mathematical tasks.
- https://github.com/huggingface/Math-Verify

Datasets

OpenR1-Math-220k ModelScope | HuggingFace
OpenR1-Math-Raw ModelScope | HuggingFace
MathR - A dataset distilled from DeepSeek-R1 for NuminaMath hard-level problems.
Dolphin-R1 (HuggingFace | ModelScope) - 800k samples dataset to train DeepSeek-R1 Distill models.
R1-Distill-SFT (HuggingFace | ModelScope)
NuminaMath-TIR - Tool-integrated reasoning (TIR) plays a crucial role in this competition.
NuminaMath-CoT - Approximately 860k math problems, where each solution is formatted in a Chain of Thought (CoT) manner.
BAAI-TACO - TACO is a benchmark for code generation with 26443 problems.
OpenThoughts-114k - Open synthetic reasoning dataset with 114k high-quality examples covering math, science, code, and puzzles!
Bespoke-Stratos-17k - A reasoning dataset of questions, reasoning traces, and answers.
Clevr_CoGenT_TrainA_R1 - A multi-modal dataset for training MM R1 model.
clevr_cogen_a_train - A R1-distilled visual reasoning dataset.
S1k - A dataset for training S1 model.
中文基于满血DeepSeek-R1蒸馏数据集-110k ModelScope | HuggingFace

Evaluation

Best practice for evaluating R1/o1-like reasoning models
MATH-500 - A subset of 500 problems from the MATH benchmark that OpenAI created in their Let's Verify Step by Step paper
AIME-2024 - This dataset contains problems from the American Invitational Mathematics Examination (AIME) 2024.
AIME-VALIDATION - All 90 problems come from AIME 22, AIME 23, and AIME 24
MATH-LEVEL-4 - A subset of level 4 problems from the MATH benchmark.
MATH-LEVEL-5 - A subset of level 5 problems from the MATH benchmark.
aimo-validation-amc - All 83 samples come from AMC12 2022, AMC12 2023
GPQA-Diamond - Diamond subset from GPQA benchmark.
Codeforces-Python-Submissions - A dataset of Python submissions from Codeforces.

Others

Replicates of DeepSeek-R1 and DeepSeek-R1-Zero

HuggingFace Open R1
Simple Reinforcement Learning for Reasoning
oatllm
TinyZero
32B-DeepSeek-R1-Zero
R1-V - Multi-modal R1
Open-R1-Multimodal - A multimodal reasoning model based on OpenR1
R1-Multimodal-Journey - A journey to replicate multimodal reasoning model based on Open-R1-Multimodal
VLM-R1 | DEMO - A stable and generalizable R1-style Large Vision-Language Model
X-R1
Open-Reasoner-Zero

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-deep-reasoning

Table of Contents

News

Highlights

DeepSeek repos:

Qwen repos:

Papers

Models

Infra

Datasets

Evaluation

Others

Replicates of DeepSeek-R1 and DeepSeek-R1-Zero

About

Releases

Packages

Contributors 3

Languages

modelscope/awesome-deep-reasoning

Folders and files

Latest commit

History

Repository files navigation

Awesome-deep-reasoning

Table of Contents

News

Highlights

DeepSeek repos:

Qwen repos:

Papers

Models

Infra

Datasets

Evaluation

Others

Replicates of DeepSeek-R1 and DeepSeek-R1-Zero

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages