Implementation of large network design in RL. Easy switch between toy tasks and challenging games. Mainly follow three recent papers:
- 2020 ICML Can Increasing Input Dimensionality Improve Deep Reinforcement Learning?
- 2020 NeurIPS Workshop D2RL: Deep Dense Architectures in Reinforcement Learning
- 2021 Arxiv Training Larger Networks for Deep Reinforcement Learning
In the code, we denote the method in Can Increasing Input Dimensionality Improve Deep Reinforcement Learning? as ofe
, the method in D2RL: Deep Dense Architectures in Reinforcement Learning as d2rl
, and the method in Training Larger Networks for Deep Reinforcement Learning as ofe_dense
. It is noteworthing that we only implement single-machine approach for ofe_dense
, and we observe the overfitting phenomenon. We speculate that this is because the single-machine version is not as stable as the distributed approach.
algorithm | continuous control | on-policy / off-policy |
---|---|---|
Proximal Policy Optimization (PPO) coupled with d2rl | ✅ | on-policy |
Deep Deterministic Policy Gradients (DDPG) coupled with d2rl | ✅ | off-policy |
Deep Deterministic Policy Gradients (DDPG) coupled with ofe | ✅ | off-policy |
Deep Deterministic Policy Gradients (DDPG) coupled with ofe_dense | ✅ | off-policy |
Twin Delayed Deep Deterministic Policy Gradients (TD3) coupled with d2rl | ✅ | off-policy |
Twin Delayed Deep Deterministic Policy Gradients (TD3) coupled with ofe | ✅ | off-policy |
Twin Delayed Deep Deterministic Policy Gradients (TD3) coupled with ofe_dense | ✅ | off-policy |
Soft Actor-Critic (SAC) coupled with d2rl | ✅ | off-policy |
Soft Actor-Critic (SAC) coupled with ofe | ✅ | off-policy |
Soft Actor-Critic (SAC) coupled with ofe_dense | ✅ | off-policy |
# python 3.6 (apt)
# pytorch 1.4.0 (pip)
# tensorflow 1.14.0 (pip)
# DMC Control Suite and MuJoCo
cd dockerfiles
docker build . -t rl-docker
For other dockerfiles, you can go to RL Dockefiles.
Run with the scripts batch_run_main_d2rl_4seed_cuda.sh
/ batch_run_main_ofe_4seed_cuda.sh
/ batch_run_main_ofe_dense_4seed_cuda.sh
/ batch_run_ppo_d2rl_4seed_cuda.sh
:
# eg.
bash batch_run_main_ofe_4seed_cuda.sh Ant-v2 TD3_ofe 0 True # env_name: Ant-v2, algorithm: TD3_ofe, CUDA_Num: 0, layer_norm: True
bash batch_run_ppo_d2rl_4seed_cuda.sh Ant-v2 PPO_d2rl 0 # env_name: Ant-v2, algorithm: PPO_d2rl, CUDA_Num: 0
# eg. Notice: `-l` denotes labels, `data/DDPG-Hopper-v2/` represents the collecting dataset,
# and `-s` represents smoothing value.
python spinupUtils/plot.py \
data/DDPG_ofe-Hopper-v2/ \
-l DDPG_ofe -s 10
Including Ant-v2
, HalfCheetah-v2
, Hopper-v2
, Humanoid-v2
, Walker2d-v2
.
@misc{QingLi2021larger,
author = {Qing Li},
title = {Deeper and Larger Network Design for Continous Control in RL},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/LQNew/Deeper_Larger_Actor-Critic_RL}}
}