Ready Policy One (RP1)

Code to complement "Ready Policy One: World Building through Active Learning".

Trains an agent inside an ensemble of dynamics models.

General instructions

All the experiments run in this paper exist in the args_yml directory. On the machines we trained on, we could run 5 seeds concurrently, hence the macro-level script run_experiments.py launches 5 at once, with a binary flag to toggle seeds 0-4 or 5-9.

To run the HalfCheetah Ready Policy One experiments for seeds 5-9, type the following:

python run_experiments.py --yaml ./args_yml/main_exp/halfcheetah-rp1.yml --seeds5to9

Citation

@article{rpone2020,
title={Ready Policy One: World Building Through Active Learning},
author={Ball, Philip and Parker-Holder, Jack and Pacchiano, Aldo and Choromanski, Krzysztof and Roberts, Stephen},
journal={Proceedings of the 37th International Conference on Machine Learning},
year={2020}
}

FAQs

Why is model free running so slowly?

Two reasons: 1) It is non-parallelised; 2) This code tries to find GPUs where possible, try forcing it to run on CPU

Acknowledgements

The authors acknowledge Nikhil Barhate for his PPO-PyTorch repo. The ppo.py file here is a heavily modified version of this code.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
args_yml		args_yml
assets		assets
LICENSE		LICENSE
README.md		README.md
env_aug.py		env_aug.py
model.py		model.py
online_learning.py		online_learning.py
ppo.py		ppo.py
requirements.txt		requirements.txt
rp1_uncertainty.py		rp1_uncertainty.py
run_experiments.py		run_experiments.py
train.py		train.py
train_funcs.py		train_funcs.py
utils.py		utils.py

License

philipjball/ReadyPolicyOne

Folders and files

Latest commit

History

Repository files navigation

Ready Policy One (RP1)

General instructions

Citation

FAQs

Why is model free running so slowly?

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Languages