diff --git a/docs/index.md b/docs/index.md index 8eb2fab83c..c030981f8f 100644 --- a/docs/index.md +++ b/docs/index.md @@ -62,6 +62,7 @@ and how to implement new MDPs and new algorithms. RL2 SAC TD3 + TEPPO TRPO REINFORCE diff --git a/docs/user/algo_teppo.md b/docs/user/algo_teppo.md new file mode 100644 index 0000000000..605f8b5551 --- /dev/null +++ b/docs/user/algo_teppo.md @@ -0,0 +1,76 @@ +# Proximal Policy Optimization with Task Embedding (TEPPO) + + +```eval_rst +.. list-table:: + :header-rows: 0 + :stub-columns: 1 + :widths: auto + + * - **Paper** + - Learning Skill Embeddings for Transferable Robot Skills :cite:`hausman2018learning` + * - **Framework(s)** + - .. figure:: ./images/tf.png + :scale: 20% + :class: no-scaled-link + + Tensorflow + * - **API Reference** + - `garage.tf.algos.TEPPO <../_autoapi/garage/tf/algos/index.html#garage.tf.algos.TEPPO>`_ + * - **Code** + - `garage/tf/algos/te_ppo.py `_ + * - **Examples** + - :ref:`te_ppo_metaworld_mt1_push`, :ref:`te_ppo_metaworld_mt10`, :ref:`te_ppo_metaworld_mt50`, :ref:`te_ppo_point` +``` + +Proximal Policy Optimization Algorithms (PPO) is a family of policy gradient methods which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. TEPPO parameterizes the PPO policy via a shared skill embedding space. + +## Default Parameters + +```py +discount=0.99, +gae_lambda=0.98, +lr_clip_range=0.01, +max_kl_step=0.01, +policy_ent_coeff=1e-3, +encoder_ent_coeff=1e-3, +inference_ce_coeff=1e-3 +``` + +## Examples + +### te_ppo_metaworld_mt1_push + +```eval_rst +.. literalinclude:: ../../examples/tf/te_ppo_metaworld_mt1_push.py +``` + +### te_ppo_metaworld_mt10 + +```eval_rst +.. literalinclude:: ../../examples/tf/te_ppo_metaworld_mt10.py +``` + +### te_ppo_metaworld_mt50 + +```eval_rst +.. literalinclude:: ../../examples/tf/te_ppo_metaworld_mt50.py +``` + +### te_ppo_point + +```eval_rst +.. literalinclude:: ../../examples/tf/te_ppo_point.py +``` + +## References + +```eval_rst +.. bibliography:: references.bib + :style: unsrt + :filter: docname in docnames +``` + +---- + +*This page was authored by Nicole Shin Ying Ng ([@nicolengsy](https://github.com/nicolengsy)).* diff --git a/docs/user/references.bib b/docs/user/references.bib index 740c4f71fa..87a37a1b80 100644 --- a/docs/user/references.bib +++ b/docs/user/references.bib @@ -83,6 +83,15 @@ @article{yu2019metaworld journal={arXiv:1910.10897}, } +@inproceedings{hausman2018learning, + title={Learning an Embedding Space for Transferable Robot Skills}, + author={Karol Hausman and Jost Tobias Springenberg and Ziyu Wang and Nicolas Heess and Martin Riedmiller}, + booktitle={International Conference on Learning Representations}, + year={2018}, + journal={}, + url={https://openreview.net/forum?id=rk07ZXZRb}, +} + @article{lillicrap2015continuous, title={Continuous control with deep reinforcement learning}, author={Lillicrap, Timothy P and Hunt, Jonathan J and Pritzel, Alexander and Heess, Nicolas and Erez, Tom and Tassa, Yuval and Silver, David and Wierstra, Daan},