Off-Policy Deep Reinforcement Learning without Exploration

Code corresponding to the paper. If you use our code please cite the paper.

Method is tested on MuJoCo continuous control tasks in OpenAI gym. Networks are trained using PyTorch 0.4 and Python 2.7.

Overview

Main algorithm, Batch-Constrained Q-learning (BCQ), can be found at BCQ.py.

If you are interested in reproducing some of the results from the paper, an expert policy (DDPG) needs to be trained by running train_expert.py. This will save the expert model. A new buffer can then be collected by running generate_buffer.py and adjusting the settings in the code or using the default settings.

If you are interested in the standard forward RL tasks with DDPG or TD3, check out my other Github.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
__pycache__		__pycache__
buffers		buffers
pytorch_models		pytorch_models
results		results
.DS_Store		.DS_Store
.gitignore		.gitignore
1907.00456.pdf		1907.00456.pdf
1907.04543.pdf		1907.04543.pdf
9349-stabilizing-off-policy-q-learning-via-bootstrapping-error-reduction.pdf		9349-stabilizing-off-policy-q-learning-via-bootstrapping-error-reduction.pdf
BCQ.pdf		BCQ.pdf
BCQ.py		BCQ.py
DDPG.py		DDPG.py
LICENSE		LICENSE
README.md		README.md
generate_buffer.py		generate_buffer.py
main.py		main.py
train_expert.py		train_expert.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Off-Policy Deep Reinforcement Learning without Exploration

Overview

About

Releases

Packages

Languages

License

dmund95/bcq

Folders and files

Latest commit

History

Repository files navigation

Off-Policy Deep Reinforcement Learning without Exploration

Overview

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages