Sokoban: Learning policies with GNN

Repo for the project of the 2019 MVA course "Graph in Machine Learning".

In this work, we address the problem of learning general policies with Graph Neural Networks to solve the Sokoban environment, a difficult planning puzzle. We build a graph representation for Sokoban levels, learn policies with a Deep Q-learning approach and then investigate the transfer and generalization capacities enabled by GNNs of the learned policies.

Getting Started

This implementation uses python 3.6, pytorch, Cuda 10.1.

Prerequisites

First, install the requirements necessary to run this code.

pip install -r requirements.txt

Then, you need to install Pytorch Geometric by following the installation tutorial here.

Usage

Before training a model, you will need to generate Sokoban levels in a .png format. We implemented 3 ways of generating levels:

By using the gym-sokoban environment:

python data/generate_gym.py

Generate dummy levels:

python data/generate_dummy.py --dir [SAVE_DIR] --width [WIDTH] --height [HEIGHT] --boxes [NB_BOXES] --levels [NB_LEVELS]

Build custom levels (need TKinter):

python data/level_builder.py --dir [SAVE_DIR] --width [WIDTH] --height [HEIGHT] --size [CELL_SIZE]

After generating levels, you can train models:

python train.py #parameters

The parameters for training are the following:

--train_path, "levels/dummy/train", train dir
--test_path, "levels/dummy/test", test dir
--training_id, str(int(time.time())), log folder name
--logs, "./logs/", Log folder
--pretrained, False , Use pretrained model
--weights_path, "", path for pretrained weigths
--save_every, 10, Save model weigths every xx epochs
--gpu, 0, Which GPU ID to use
--cpu, False, Use CPU
--epochs, 500, Number of epochs
--seed, 123, seed
--max_steps, 25, Maximum steps in environment for the train phase
--max_steps_eval, 25, Maximum steps in environment for the test phase
--batch_size, 32, batch size
--buffer_size, 10000, Replay memory buffer size
--target_update, 100, Target network update
--gamma, 1.0, discounted factor
--eps_max, 1.0, Epsilon-greedy initial value
--eps_min, 0.1, Epsilon greedy final minimum value
--eps_stop_step, 100000, Number of steps where epsilon reaches its min value
--hiddens, 64, hidden units
--num_message_passing, 2, number of EdgeConv layers
--lr, 0.0005, Learning rate

Several information about the training will be stored in the log directory, and you can monitor them on the notebook notebooks/history_vizualization.ipynb.

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
data		data
model		model
notebooks		notebooks
report		report
rl		rl
.gitignore		.gitignore
.pylintrc		.pylintrc
README.md		README.md
logger.py		logger.py
options.py		options.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sokoban: Learning policies with GNN

Getting Started

Prerequisites

Usage

Built With

Authors

Bibliography

About

Releases

Packages

Contributors 2

Languages

mathieuorhan/graph-sokoban

Folders and files

Latest commit

History

Repository files navigation

Sokoban: Learning policies with GNN

Getting Started

Prerequisites

Usage

Built With

Authors

Bibliography

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages