1 Assignment

For the assignment I need to choose 2 algorithms, train them on an environment we haven't seen before and compare them.

2. Evironment

2.1 What is Space invaders

Space invaders is a classic shoot them up arcade game from 1978 developped by Tomohiro Nishikado.
The goal is to defeat wave after wave of descending aliens with a horizontally moving laser to earn as many points as possible without getting hit by the aliens.

3 Algorithms

3.1 Which algorithms to compare

For this project I'll be using the Deep Q Learning algorithm and the Actor-critic algorithm.

Orignally I wanted to compare a value based apporach with a policy based one. For my value based algorithm I opted for DQN since I thought it would preform better than tabular Q-learning or SARSA because of the (possibly) large state space. I also find it a more intresting algorithm because of the neural network part. I compared some policy based approaches but in the end settled on actor-critic because of the time to train (didn't have that much time for this assignment) and lack of inherit overfitting. This is not a true policy based algorithm but rather a hybrid between policy and value based so in the end I wouldn't call it a true comparison.

The algorithms will be compared on:

Training time
Episodes rewards
Agent results

3.2 Deep Q learning (DQN)

Deep Q learning is a value based algorithm,
A value based algorithm trains a value-function that outputs the value of a state or state-action pair. That value-function is then used by the policy (which is chosen in advanced) to take an action.
Finding the optimal value-function will automatically result in finding the optimal policy.

Q-learning does this by initializing a Q-table (mappings of state-action pair & Q-value) chosing an action based on this table and updating it with the Bellman equation.
When there are too many states/ actions or a continous state space this approach doesn't work anymore because the Q-table becomes too large.
This is where DQN comes into play: Instead of getting the next action from the Q-table the model predicts the next action.
Instead of updating the Q-table we fit our model.

3.3 Actor-critic (AC)

Actor-critic is a hybrid between policy and value based algortihms with the actor part being policy based (it takes in a state and returns the probability of all actions) and the critic being value based (it takes in the state and the predicted value and returns a score to see how well the action fits)

4 Results

The Actor-critic algorithm seemed to be the better one in my case, it was a lot faster to train, had a higher high score and was more consistent

4.1 Demo

Both AC & DQN where trained on 100 episodes AC high score (690):

DQN high score (520):

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
Presentation space invaders.pdf		Presentation space invaders.pdf
README.md		README.md
Space_Invaders.jpg		Space_Invaders.jpg
space_invaders.ipynb		space_invaders.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1 Assignment

2. Evironment

2.1 What is Space invaders

3 Algorithms

3.1 Which algorithms to compare

3.2 Deep Q learning (DQN)

3.3 Actor-critic (AC)

4 Results

4.1 Demo

About

Releases

Packages

Languages

4rn3/rl_school_project

Folders and files

Latest commit

History

Repository files navigation

1 Assignment

2. Evironment

2.1 What is Space invaders

3 Algorithms

3.1 Which algorithms to compare

3.2 Deep Q learning (DQN)

3.3 Actor-critic (AC)

4 Results

4.1 Demo

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages