TemporalUT3

Temporal difference learning for ultimate tic-tac-toe.

What is ultimate tic-tac-toe?

It's like tic-tac-toe, but each square of the game contains another game of tic-tac-toe in it! Win small games to claim the squares in the big game. Simple, right? But there is a catch: Whichever small square you pick is the next big square your opponent must play in. Read more...

What is temporal difference learning?

Temporal difference (TD) learning is a reinforcement learning algorithm trained only using self-play. The algorithm learns by bootstrapping from the current estimate of the value function, i.e. the value of a state is updated based on the current estimate of the value of future states. Read more...

How to use

Training

To begin training:

python train.py

or set the learning hyperparameters using any of the optional arguments:

python train.py --lr LEARN_RATE --a ALPHA --e EPSILON

Playing

You can play against a trained model using

python player.py --params path/to/parameters.params

If no parameters are provided, the opponent will make moves randomly.

Experiments

Coming soon.

To-do

Scale the value of terminal results by the game length to prefer shorter games.
Implement UT3 neural network in other frameworks, eg: TensorFlow.
Make asynchronous, i.e. do self-play, neural net training and model comparison in parallel.

Requirements

Thanks

Sam Culley

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
README.md		README.md
evaluate.py		evaluate.py
game.py		game.py
model.py		model.py
player.py		player.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TemporalUT3

What is ultimate tic-tac-toe?

What is temporal difference learning?

How to use

Training

Playing

Experiments

To-do

Requirements

Thanks

About

Releases

Packages

Languages

keeeal/temporal-ut3

Folders and files

Latest commit

History

Repository files navigation

TemporalUT3

What is ultimate tic-tac-toe?

What is temporal difference learning?

How to use

Training

Playing

Experiments

To-do

Requirements

Thanks

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages