This repository contains the official code for our paper Direct Multi-Turn Preference Optimization for Language Agents. (EMNLP 2024 Main Conference)
You can set up the environment and download the data by running bash setup.sh
.
You can complete the DMPO pipeline by running run_dmpo.sh <DATASET> <BASIC_MODEL_PATH> <NEW_MODEL_SAVING_PATH>
. The script contains three sections:
- Training and evaluating the SFT model
- Constructing the DMPO training dataset
- Training and evaluating the DMPO model
Similarly, you can run the code run_dmpo_mistral.sh <DATASET> <BASIC_MODEL_PATH> <NEW_MODEL_SAVING_PATH>
to perform training using the Mistral model.
If you find this code useful, please cite our paper:
@misc{shi2024directmultiturnpreferenceoptimization,
title={Direct Multi-Turn Preference Optimization for Language Agents},
author={Wentao Shi and Mengqi Yuan and Junkang Wu and Qifan Wang and Fuli Feng},
year={2024},
eprint={2406.14868},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2406.14868},
}