The project aims to resolve the problem of an agent that moves in an environment while picking yellow bananas and avoiding blue bananas.
Collecting yellow bananas provides +1 reward, collecting blue bananas provides -1 reward.
The state space has 37 dimensions containing the agent's dinamic state (velocity, position, etc) and ray-based perception of objects around the agent's forward position.
The actions available are:
- 0 - move forward
- 1 - move backward
- 2 - turn left
- 3 - turn right
The code uses Python 3.6 and to install the required dependencies follow these instructions:
- Clone the DRLND Github repository and follow the instructions on the README.md file to install Pytorch, the ML-Agents toolkit, and a few more Python packages.
- Download the Unity Environment and unzip it in a convenient directory:
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
- At the beginning of the files train.py and play.py there is a path to the Unity Environment. You need to edit the path and points to where you unzipped it.
To train the agent run the train.py file. Prior execution the path to the banana environment must be edited at the beginning of the file.
train.py will train the Q-Network and save the weights of the network to the 'checkpoint.pth' file when the goal of 13 average reward is achieved.
In addition, the play.py script loads the checkpoint.pth weights into the network and play the environment. Don't forget to edit the path to the banana environment.