Skip to content

illidanlab/rpg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Ranking Policy Gradient

Ranking Policy Gradient (RPG) is a sample-efficient off-policy policy gradient method that learns optimal ranking of actions to maximize the return. RPG has the following practical advantages:

  • It is a sample-efficient model-free algorithm for learning deterministic policies.
  • It is effortless to incorporate any exploration algorithm to improve the sample-efficiency of RPG further.

This codebase contains the implementation of RPG using the dopamine framework. The preprint of the RPG paper is available here.

Instructions

Install via source

Step 1.

Follow the install instruction of dopamine framework for Ubuntu or Max OS X.

Step 2.

Download the RPG source, i.e.

git clone [email protected]:illidanlab/rpg.git

Running the tests

cd ./rpg/dopamine 
python -um dopamine.atari.train \
  --agent_name=rpg \
  --base_dir=/tmp/dopamine \
  --random_seed 1 \
  --game_name=Pong \
  --gin_files='dopamine/agents/rpg/configs/rpg.gin'

Reproduce

To reproduce the results in the paper, please refer to the instruction in here.

Reference

If you use this RPG implementation in your work, please consider citing the following papers:

@article{lin2019ranking,
  title={Ranking Policy Gradient},
  author={Lin, Kaixiang and Zhou, Jiayu},
  journal={arXiv preprint arXiv:1906.09674},
  year={2019}
}