Skip to content

Pytorch implementation of large network design in continous control RL.

License

Notifications You must be signed in to change notification settings

LQNew/Deeper_Larger_Actor-Critic_RL

Repository files navigation

Deeper and Larger Network Design for Continous Control in RL

Implementation of large network design in RL. Easy switch between toy tasks and challenging games. Mainly follow three recent papers:

In the code, we denote the method in Can Increasing Input Dimensionality Improve Deep Reinforcement Learning? as ofe, the method in D2RL: Deep Dense Architectures in Reinforcement Learning as d2rl, and the method in Training Larger Networks for Deep Reinforcement Learning as ofe_dense. It is noteworthing that we only implement single-machine approach for ofe_dense, and we observe the overfitting phenomenon. We speculate that this is because the single-machine version is not as stable as the distributed approach.

Supported algorithms

algorithm continuous control on-policy / off-policy
Proximal Policy Optimization (PPO) coupled with d2rl on-policy
Deep Deterministic Policy Gradients (DDPG) coupled with d2rl off-policy
Deep Deterministic Policy Gradients (DDPG) coupled with ofe off-policy
Deep Deterministic Policy Gradients (DDPG) coupled with ofe_dense off-policy
Twin Delayed Deep Deterministic Policy Gradients (TD3) coupled with d2rl off-policy
Twin Delayed Deep Deterministic Policy Gradients (TD3) coupled with ofe off-policy
Twin Delayed Deep Deterministic Policy Gradients (TD3) coupled with ofe_dense off-policy
Soft Actor-Critic (SAC) coupled with d2rl off-policy
Soft Actor-Critic (SAC) coupled with ofe off-policy
Soft Actor-Critic (SAC) coupled with ofe_dense off-policy

Instructions

Recommend: Run with Docker

# python        3.6    (apt)
# pytorch       1.4.0  (pip)
# tensorflow    1.14.0 (pip)
# DMC Control Suite and MuJoCo
cd dockerfiles
docker build . -t rl-docker

For other dockerfiles, you can go to RL Dockefiles.

Launch experiments

Run with the scripts batch_run_main_d2rl_4seed_cuda.sh / batch_run_main_ofe_4seed_cuda.sh / batch_run_main_ofe_dense_4seed_cuda.sh / batch_run_ppo_d2rl_4seed_cuda.sh:

# eg.
bash batch_run_main_ofe_4seed_cuda.sh Ant-v2 TD3_ofe 0 True # env_name: Ant-v2, algorithm: TD3_ofe, CUDA_Num: 0, layer_norm: True

bash batch_run_ppo_d2rl_4seed_cuda.sh Ant-v2 PPO_d2rl 0 # env_name: Ant-v2, algorithm: PPO_d2rl, CUDA_Num: 0

Plot results

# eg. Notice: `-l` denotes labels, `data/DDPG-Hopper-v2/` represents the collecting dataset, 
# and `-s` represents smoothing value.
python spinupUtils/plot.py \
    data/DDPG_ofe-Hopper-v2/ \
    -l DDPG_ofe -s 10

Performance on MuJoCo

Including Ant-v2, HalfCheetah-v2, Hopper-v2, Humanoid-v2, Walker2d-v2.

  • DDPG and its variants

  • TD3 and its variants

  • SAC and its variants

  • PPO and its variants

Citation

@misc{QingLi2021larger,
  author = {Qing Li},
  title = {Deeper and Larger Network Design for Continous Control in RL},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/LQNew/Deeper_Larger_Actor-Critic_RL}}
}

About

Pytorch implementation of large network design in continous control RL.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published