This is a trained model of a PPO agent playing LunarLander-v2 using the stable-baselines3 library.
Using Google Colab, I trained my first Deep Reinforcement Learning agent, a Lunar Lander agent that will learn to land correctly on the moon using Stable-Baselines3.
I trained the agent for 1,000,000 timesteps, resulting in a mean reward of 206.92 +/- 53.53.
To improve the model:
- Train more steps
- Try different hyperparameters for PPO. Check out: https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html#parameters.
- Try another model such as DQN
replay.mp4
Environment: LunarLander-v2 Library: stable-baselines3 Model: Proximal Policy Optimization (PPO) Mean Reward +/- Std. Dev.: 206.92 +/- 53.53