This project demonstrates the generalizability of “Deep Q-Learning” in learning control policies from visual outputs of different environments. The model aims to map the end-to-end relation between the visual outputs (rendered 2-d visuals of a game) to the next action (control signals like move up, move right) so as to gain maximum cumulative reward (score). The learning phase of the model involves simultaneously training the model while making predictions at each step (frame to action).
Model | Performance | Episode 20 | Episode 200 | Train Score / episode |
---|---|---|---|---|
FCNN | Open TensorBoard | ![]() |
![]() |
|
CNN | Open TensorBoard | ![]() |
![]() |
Model | Performance | Episode 20 | Episode 200 | Train Score / episode |
---|---|---|---|---|
FCNN | Open TensorBoard | ![]() |
![]() |
|
CNN | Open TensorBoard | ![]() |
![]() |
First, on your local machine run:
python train_master.py
Note: Use a port-forwarding tool like ngrok to expose the endpoint created
To moniter logs streamed from remote workers on your local machine, run:
tensorboard --logdir logs
Now, on each remote workstation run:
python train_worker.py \
--env <ENV_NAME> \
--master-endpoint <MASTER_ENDPOINT> \
--worker-name <WORKER_NAME>
To train using the CNN based model run:
python train_worker_cnn.py \
--env <ENV_NAME> \
--master-endpoint <MASTER_ENDPOINT> \
--worker-name <WORKER_NAME>
Or, run remote worker from Google Colab - https://colab.research.google.com/github/ArjunInventor/Deep-Q-Learning-Agent/blob/master/train_worker.ipynb
python play.py --model <MODEL_PATH> --env <ENV_NAME>
When using a CNN based model, run:
python play_cnn.py --model <MODEL_PATH> --env <ENV_NAME>
Use
--save-gif
to save the gameplay as a gif
This project is in completion of INT404 assignment and the final report can be found here.