Skip to content
/ D4PG Public
forked from msinto93/D4PG

Tensorflow implementation of a Deep Distributed Distributional Deterministic Policy Gradients (D4PG) network, trained on OpenAI Gym environments.

License

Notifications You must be signed in to change notification settings

yj-Tang/D4PG

This branch is up to date with msinto93/D4PG:master.

Folders and files

NameName
Last commit message
Last commit date
Jan 13, 2020
Jan 13, 2020
Nov 12, 2019
Jan 13, 2020
Dec 3, 2018
Jan 13, 2020
Jan 13, 2020
Dec 3, 2018
Nov 11, 2019
Dec 3, 2018
Dec 3, 2018
Dec 3, 2018
Dec 3, 2018

Repository files navigation

Distributed Distributional Deep Deterministic Policy Gradients (D4PG)

A Tensorflow implementation of a Distributed Distributional Deep Deterministic Policy Gradients (D4PG) network, for continuous control.

D4PG builds on the Deep Deterministic Policy Gradients (DDPG) approach (paper, code), making several improvements including the introduction of a distributional critic, using distributed agents running on multiple threads to collect experiences, prioritised experience replay (PER) and N-step returns.

Trained on OpenAI Gym environments.

This implementation has been successfully trained and tested on the Pendulum-v0, BipedalWalker-v2 and LunarLanderContinuous-v2 environments. This code can however be run on any environment with a low-dimensional (non-image) state space and continuous action space.

This currently holds the high score for the Pendulum-v0 environment on the OpenAI leaderboard

Requirements

Note: Versions stated are the versions I used, however this will still likely work with other versions.

Usage

The default environment is 'Pendulum-v0'. To use a different environment simply change the ENV parameter in params.py before running the following files.

To train the D4PG network, run

  $ python train.py

This will train the network on the specified environment and periodically save checkpoints to the /ckpts folder.

To test the saved checkpoints during training, run

  $ python test_every_new_ckpt.py

This should be run alongside the training script, allowing to periodically test the latest checkpoints as the network trains. This script will invoke the run_every_new_ckpt.sh shell script which monitors the given checkpoint directory and runs the test.py script on the latest checkpoint every time a new checkpoint is saved. Test results are saved to a text file in the /test_results folder (optional).

Once we have a trained network, we can visualise its performance in the environment by running

  $ python play.py

This will play the environment on screen using the trained network and save a GIF (optional).

Note: To reproduce the best 100-episode performance of -123.11 +/- 6.86 that achieved the top score on the 'Pendulum-v0' OpenAI leaderboard, run

  $ python test.py

specifying the train_params.ENV and test_params.CKPT_FILE parameters in params.py as Pendulum-v0 and Pendulum-v0.ckpt-660000 respectively.

Results

Result of training the D4PG on the 'Pendulum-v0' environment:

Result of training the D4PG on the 'LunarLanderContinuous-v2' environment:

Result of training the D4PG on the 'BipedalWalker-v2' environment:

Result of training the D4PG on the 'BipedalWalkerHardcore-v2' environment:

Environment Best 100-episode performance Ckpt file
Pendulum-v0 -123.11 +/- 6.86 ckpt-660000
LunarLanderContinuous-v2 290.87 +/- 2.00 ckpt-320000
BipedalWalker-v2 304.62 +/- 0.13 ckpt-940000
BipedalWalkerHardcore-v2 256.29 +/- 7.08 ckpt-8130000

All checkpoints for the above results are saved in the ckpts folder and the results can be reproduced by running python test.py and specifying the train_params.ENV and test_params.CKPT_FILE parameters in params.py for the desired environment and checkpoint file.

To-do

  • Train/test on further environments, including Mujoco

References

License

MIT License

About

Tensorflow implementation of a Deep Distributed Distributional Deterministic Policy Gradients (D4PG) network, trained on OpenAI Gym environments.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.8%
  • Shell 0.2%