Distributed Distributional Deep Deterministic Policy Gradients (D4PG)

A Tensorflow implementation of a Distributed Distributional Deep Deterministic Policy Gradients (D4PG) network, for continuous control.

D4PG builds on the Deep Deterministic Policy Gradients (DDPG) approach (paper, code), making several improvements including the introduction of a distributional critic, using distributed agents running on multiple threads to collect experiences, prioritised experience replay (PER) and N-step returns.

Trained on OpenAI Gym environments.

This implementation has been successfully trained and tested on the Pendulum-v0, BipedalWalker-v2 and LunarLanderContinuous-v2 environments. This code can however be run on any environment with a low-dimensional (non-image) state space and continuous action space.

This currently holds the high score for the Pendulum-v0 environment on the OpenAI leaderboard

Requirements

Note: Versions stated are the versions I used, however this will still likely work with other versions.

Ubuntu 16.04 (Most (non-Atari) envs will also work on Windows)
python 3.5
OpenAI Gym 0.10.8 (See link for installation instructions + dependencies)
tensorflow-gpu 1.5.0
numpy 1.15.2
scipy 1.1.0
opencv-python 3.4.0
imageio 2.4.1 (requires pillow)
inotify-tools 3.14

Usage

The default environment is 'Pendulum-v0'. To use a different environment simply change the ENV parameter in params.py before running the following files.

To train the D4PG network, run

  $ python train.py

This will train the network on the specified environment and periodically save checkpoints to the /ckpts folder.

To test the saved checkpoints during training, run

  $ python test_every_new_ckpt.py

This should be run alongside the training script, allowing to periodically test the latest checkpoints as the network trains. This script will invoke the run_every_new_ckpt.sh shell script which monitors the given checkpoint directory and runs the test.py script on the latest checkpoint every time a new checkpoint is saved. Test results are saved to a text file in the /test_results folder (optional).

Once we have a trained network, we can visualise its performance in the environment by running

  $ python play.py

This will play the environment on screen using the trained network and save a GIF (optional).

Note: To reproduce the best 100-episode performance of -123.11 +/- 6.86 that achieved the top score on the 'Pendulum-v0' OpenAI leaderboard, run

  $ python test.py

specifying the train_params.ENV and test_params.CKPT_FILE parameters in params.py as Pendulum-v0 and Pendulum-v0.ckpt-660000 respectively.

Results

Result of training the D4PG on the 'Pendulum-v0' environment:

Result of training the D4PG on the 'LunarLanderContinuous-v2' environment:

Result of training the D4PG on the 'BipedalWalker-v2' environment:

Result of training the D4PG on the 'BipedalWalkerHardcore-v2' environment:

Environment	Best 100-episode performance	Ckpt file
Pendulum-v0	-123.11 +/- 6.86	ckpt-660000
LunarLanderContinuous-v2	290.87 +/- 2.00	ckpt-320000
BipedalWalker-v2	304.62 +/- 0.13	ckpt-940000
BipedalWalkerHardcore-v2	256.29 +/- 7.08	ckpt-8130000

All checkpoints for the above results are saved in the ckpts folder and the results can be reproduced by running python test.py and specifying the train_params.ENV and test_params.CKPT_FILE parameters in params.py for the desired environment and checkpoint file.

Name	Name	Last commit message	Last commit date
Latest commit msinto93 Delete events.out.tfevents.1572895294.mark-X58A-UD3R Jan 13, 2020 538d2f2 · Jan 13, 2020 History 84 Commits
ckpts	ckpts	Delete tmp	Jan 13, 2020
test_results	test_results	Added BipedalWalkerHardcore results	Jan 13, 2020
utils	utils	Changed V_MIN of BipedalWalker env	Nov 12, 2019
video	video	Added BipedalWalkerHardcore video	Jan 13, 2020
LICENSE	LICENSE	Initial commit	Dec 3, 2018
README.md	README.md	Added BipedalWalkerHardcore results	Jan 13, 2020
agent.py	agent.py	Bug fix when creating GIF	Jan 13, 2020
learner.py	learner.py	Update learner.py	Dec 3, 2018
params.py	params.py	Added BipedalWalkerHardcore environment	Nov 11, 2019
play.py	play.py	Add files via upload	Dec 3, 2018
test.py	test.py	Add files via upload	Dec 3, 2018
test_every_new_ckpt.py	test_every_new_ckpt.py	Add files via upload	Dec 3, 2018
train.py	train.py	Add files via upload	Dec 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed Distributional Deep Deterministic Policy Gradients (D4PG)

Requirements

Usage

Results

To-do

References

License

About

Releases

Packages

Languages

License

yj-Tang/D4PG

Folders and files

Latest commit

History

Repository files navigation

Distributed Distributional Deep Deterministic Policy Gradients (D4PG)

Requirements

Usage

Results

To-do

References

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages