MuZero O An Quan

An implementation of MuZero to play the game O An Quan based on Google DeepMind paper (Schrittwieser et al., Nov 2019) and the associated pseudocode.

Please refer to the documentation. The example model was trained on 100 self-play games, over the duration of 1.5 hours. This implementation is primarily for educational purpose.
Explanatory video of MuZero

MuZero is a state of the art Reinforcement Learning algorithm for board games (Chess, Go, ...) and Atari games. It is the successor to AlphaZero but without any knowledge of the environment underlying dynamics. MuZero learns a model of the environment and uses an internal representation that contains only the useful information for predicting the reward, value, policy and transitions. MuZero is also close to Value prediction networks. See How it works.

Features

Residual Network and Fully connected network in PyTorch
Multi-Threaded/Asynchronous/Cluster with Ray
Multi GPU support for the training and the selfplay
TensorBoard real-time monitoring
Model weights automatically saved at checkpoints
Single and two player mode
Commented and documented
Pretrained weights available

Further improvements

Here is a list of features which could be interesting to add but which are not in MuZero's paper. We are open to contributions and other ideas.

Picking up from the original implementation by Werner Duvaud, we have also implemented

Batch MCTS
Support of more than two player games

Tests are done on Ubuntu with 16 GB RAM / Intel i7 / GTX 1050Ti Max-Q. We make sure to obtain a progression and a level which ensures that it has learned. But we do not systematically reach a human level. For certain environments, we notice a regression after a certain time. The proposed configurations are certainly not optimal and we do not focus for now on the optimization of hyperparameters. Any help is welcome.

Code structure

Network summary:

Getting started

Installation

git clone https://github.com/dmtrung14/muzero-oanquan.git
cd muzero-oanquan

pip install -r requirements.lock

Run

python muzero.py

To visualize the training results, run in a new terminal:

tensorboard --logdir ./results

Config

You can adapt the configurations of each game by editing the MuZeroConfig class of the respective file in the games folder.

Related work

EfficientZero (Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, Yang Gao)
Sampled MuZero (Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Mohammadamin Barekatain, Simon Schmitt, David Silver)

Authors

Trung Dang
Werner Duvaud, Aurèle Hainaut, and Paul Lenoir
Contributors

Getting involved

GitHub Issues: For reporting bugs.
Pull Requests: For submitting code contributions.
Discord server: For discussions about development or any general questions.

Name		Name	Last commit message	Last commit date
Latest commit History 189 Commits
.github		.github
docs		docs
games		games
model		model
modules		modules
results		results
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
backend.py		backend.py
muzero.py		muzero.py
package-lock.json		package-lock.json
package.json		package.json
requirements.lock		requirements.lock
requirements.txt		requirements.txt
server.js		server.js
vercel.json		vercel.json

License

dmtrung14/muzero-oanquan

Folders and files

Latest commit

History

Repository files navigation

MuZero O An Quan

Features

Further improvements

Code structure

Getting started

Installation

Run

Config

Related work

Authors

Getting involved

About

Resources

License

Stars

Watchers

Forks

Languages