Dr. Zero: Self-Evolving Search Agents without Training Data

This repository contains the code for Dr. Zero: Self-Evolving Search Agents without Training Data. In this work, we introduce Dr. Zero, a framework enabling search agents to effectively self-evolve without any training data. In particular, we design a self-evolution feedback loop where a proposer generates diverse questions to train a solver initialized from the same base model. As the solver evolves, it incentivizes the proposer to produce increasingly difficult yet solvable tasks, thus establishing an automated curriculum to refine both agents. To enhance training efficiency, we also introduce hop-grouped relative policy optimization (HRPO). This method clusters structurally similar questions to construct group-level baselines, effectively minimizing the sampling overhead in evaluating each query's individual difficulty and solvability. Consequently, HRPO significantly reduces the compute requirements for solver training without compromising performance or stability. Extensive experiment results demonstrate that the data-free Dr. Zero matches or surpasses fully supervised search agents, proving that complex reasoning and search capabilities can emerge solely through self-evolution.

🚀 Overview

The core idea is to bootstrap a search agent from a base model (e.g., Qwen or Llama) through multiple iterations of data-free self-evolution and reinforcement learning in a multi-turn tool-using environment.

Proposer: A question generation agent that aims to create hard yet solvable questions and thereby driving the solver improvement.
Solver: The primary search agent that is trained with synthetic data from the proposer to answer challenging questions using the search tool.
Zero-Data Initialization: The process starts with zero training data and relies solely on an external search engine (e.g., Wikipedia passage retriever).

🛠️ Setup & Installation

1. Environment

Ensure you have a Python environment with the necessary dependencies (PyTorch, transformers, faiss-gpu, verl==0.5.0, etc.). The rest of the dependencies can be found here and here.

2. Search Engine

This framework relies on a local server with a retriever model. Prepare the corpus and build the index before training.

Download & Index Corpus: Execute the following commands to download the Wikipedia English dump and build the faiss index for the retriever (default: intfloat/e5-base-v2). More details can be found under the search folder and the Search-R1 repository.

save_path=./corpus
python scripts/download.py --save_path $save_path
cat $save_path/part_* > $save_path/e5_Flat.index
gzip -d $save_path/wiki-18.jsonl.gz

🔄 Iterative Self-Evolution Workflow

The training process proceeds in iterations (Iter 1, Iter 2, Iter 3...). Each iteration typically consists of three phases:

Phase 0: Initial Data Preparation

Before the first iteration, prepare the initial synthetic dataset for training and evaluation.

python process_train.py --local_dir ./data
python process_test.py --local_dir ./data

Iteration 1

1. Train Proposer: Train the proposer agent to generate challenging yet manageable questions for the base solver.

bash iter1_challenger.sh

2. Generate Synthetic Data: Generate training data using the learnt proposer model. Parameters such as model path and sample size can be specified in the script.

bash iter1_gen_data.sh

3. Train Solver: Train the solver agent on the generated synthetic data using GRPO. This optimizes the solver's ability to search and reason over challenging questions.

bash iter1_solver.sh

4. Convert Solver to HF Checkpoint: Specify the trained model path and convert the FSDP checkpoint to the HF format. This allows the proposer to load the latest solver for reward estimation in the next training iteration.

bash convert.sh

Subsequent Iterations (Iter 2, Iter 3...)

Repeat the process using the scripts for the respective iteration. The model checkpoints from the previous iteration are used as the starting point for the next. You may need to modify the iteration number and model paths in the scripts.

iter2_challenger.sh -> iter2_gen_data.sh -> iter2_solver.sh -> convert.sh
iter3_challenger.sh -> iter3_gen_data.sh -> iter3_solver.sh -> convert.sh

License

The code is released under a non-commercial license. See LICENSE for more details.

Citation

Please consider citing if you use our methods in your research:

@article{yue2026drzero,
  title={Dr. Zero: Self-Evolving Search Agents without Training Data},
  author={Yue, Zhenrui and Upasani, Kartikeya and Yang, Xianjun and Ge, Suyu and Nie, Shaoliang and Mao, Yuning and Liu, Zhe and Wang, Dong},
  journal={arXiv preprint arXiv:2601.07055},
  year={2026}
}

Acknowledgements

During the implementation we base our code mostly on Search-R1 and VeRL. Many thanks to these authors for their great work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dr. Zero: Self-Evolving Search Agents without Training Data

🚀 Overview

🛠️ Setup & Installation

1. Environment

2. Search Engine

🔄 Iterative Self-Evolution Workflow

Phase 0: Initial Data Preparation

Iteration 1

Subsequent Iterations (Iter 2, Iter 3...)

License

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
config		config
scripts		scripts
search		search
verl		verl
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
convert.sh		convert.sh
iter1_challenger.sh		iter1_challenger.sh
iter1_gen_data.sh		iter1_gen_data.sh
iter1_solver.sh		iter1_solver.sh
iter2_challenger.sh		iter2_challenger.sh
iter2_gen_data.sh		iter2_gen_data.sh
iter2_solver.sh		iter2_solver.sh
iter3_challenger.sh		iter3_challenger.sh
iter3_gen_data.sh		iter3_gen_data.sh
iter3_solver.sh		iter3_solver.sh
process_test.py		process_test.py
process_train.py		process_train.py

License

facebookresearch/drzero

Folders and files

Latest commit

History

Repository files navigation

Dr. Zero: Self-Evolving Search Agents without Training Data

🚀 Overview

🛠️ Setup & Installation

1. Environment

2. Search Engine

🔄 Iterative Self-Evolution Workflow

Phase 0: Initial Data Preparation

Iteration 1

Subsequent Iterations (Iter 2, Iter 3...)

License

Citation

Acknowledgements

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages