Skip to content

swt-user/DMPO

Repository files navigation

Direct Multi-Turn Preference Optimization for Language Agent

Image

This repository contains the official code for our paper Direct Multi-Turn Preference Optimization for Language Agents. (EMNLP 2024 Main Conference)

Setup

You can set up the environment and download the data by running bash setup.sh.

Run

You can complete the DMPO pipeline by running run_dmpo.sh <DATASET> <BASIC_MODEL_PATH> <NEW_MODEL_SAVING_PATH>. The script contains three sections:

  • Training and evaluating the SFT model
  • Constructing the DMPO training dataset
  • Training and evaluating the DMPO model

Similarly, you can run the code run_dmpo_mistral.sh <DATASET> <BASIC_MODEL_PATH> <NEW_MODEL_SAVING_PATH> to perform training using the Mistral model.

Citation

If you find this code useful, please cite our paper:

@misc{shi2024directmultiturnpreferenceoptimization,
      title={Direct Multi-Turn Preference Optimization for Language Agents}, 
      author={Wentao Shi and Mengqi Yuan and Junkang Wu and Qifan Wang and Fuli Feng},
      year={2024},
      eprint={2406.14868},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2406.14868}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published