Skip to content

A mouse finds the cheese with the help of reinforcement learning (value iteration).

License

Notifications You must be signed in to change notification settings

alexgran875/find_the_cheese

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Find the Cheese!

The goal of this project was to design an environment where a mouse learns to find the cheese. Below are demonstrations of every 10th game (up to game 30):

alt text alt text alt text alt text

How It Works

This problem is similar to the gridworld problem described in chapter 4 of Reinforcement Learning: An Introduction (Second Edition) by Richard S. Sutton and Andrew G. Barto. The differences are as follows:

  • The agent is given a reward of 300 for entering the terminal state.
  • There is only one terminal state in the bottom right.
  • The gridworld is 48x40 (rows x columns).

The problem is formulated as a finite undiscounted episodic MDP. To add difficulty to the problem the agent can only see at most 2 tiles in any direction and also starts in a random position every time. Every frame the value of all the visible tiles is updated using the value iteration algorithm from chapter 4. As the agent explores the gridworld the value function will eventually converge. Using the greedy policy with respect to the value function, the agent will eventually be able to find the terminal state from anywhere using the shortest possible path every time.

An interesting consequence of having a negative reward on every transition in this problem is that in the beginning the agent is motivated to go where it hasn't been before, i.e. explore the gridworld. This is because the longer time it spends in an area, the lower the expected reward will become for those tiles and the agent will move towards unexplored tiles (unexplored tiles have an initiated value of zero).

This Project Was Built Using

  • Pyglet (graphics)
  • Numpy (gridworld representation using a matrix)

About

A mouse finds the cheese with the help of reinforcement learning (value iteration).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages