UNDER DEVELOPMENT!!
- pip
Just run pip install -r requirements.txt
to install the dependencies. Be careful with the Python version and global packages.
- virtualenv
If you're familiar with virtualenv
, then you can create the environment by
virtualenv demo
and activate the virtual environment
source bin/activate
Finally, use pip
to install the requirements
pip install -r requirements.txt
Of course, virtualenvwrapper
is more pleasant.
- pipenv(highly recommended)
If you can use pipenv
, that's perfect.
pipenv install
to create the project and install all the dependencies for it. Make sure Python 3.6 is installed on your system.
- pip/virtualenv
Run python3.6 main.py -h
directly to see the help page.
- pipenv
Run pipenv run python3.6 main.py -h
to see the help.
usage: main.py [-h] {train,run} ... This is a demo to show how Q_learning makes agent intelligent optional arguments: -h, --help show this help message and exit mode: {train,run} Choose a mode train Train an agent run Make an agent run
Help for train subcommand
usage: main.py train [-h] [-m {c,r}] [-r ROUND] [-l] [-s] [-c CONFIG_FILE] [-d {t}] [-a] optional arguments: -h, --help show this help message and exit -m {c,r}, --mode {c,r} Training mode, by rounds or by convergence -r ROUND, --round ROUND Training rounds, neglect when convergence is chosen -l, --load Whether to load Q table from a csv file when training -s, --show Show the training process. -c CONFIG_FILE, --config_file CONFIG_FILE Config file for significant parameters -d {t}, --demo {t} Choose a demo to run -a, --heuristic Whether to use a heuristic iteration -g {Q,SARSA}, --algorithm {Q,SARSA} Training algorithm: Q or SARSA, default is Q
Details:
- m
Mode of terminal when training. c
stands for 'convergence', r
stands for 'round'. If c
is chosen, then the agent will stop only when the Q table is converged. If r
is chosen, the agent will only be trained for certain rounds(which can be modified by -r
flag).
- l
Load the Q table from a csv file. The file name can be modified in the program. If not, a new Q table is built.
- r
Number of rounds to train the warrior. Will be ignored is -m c
is chosen.
- s
s
flag can show the process of training if been selected.
- c
A config filename can be specified when training with this argument.
- d
Choose a demo to train.
- a
Whether to use the heuristic policy to accelerate the training progress.
- g
Choose an algorithm from {Q, SARSA, DoubleQ}
Help for run subcommand
usage: main.py run [-h] [-d {t}] [-q Q] optional arguments: -h, --help show this help message and exit -d {t}, --demo {t} Choose a demo to run -q Q Choose a Q table from a csv file
Details:
- d
Choose a demo to run.
- q
Specify a Q table file to use when run.
Config file must be a YAML file containing the following parameters
size: 10
epsilon: 0.9
gamma: 0.9
alpha: 0.1
speed: 0.1
- size
The length of the map.
- epsilon
The probability of choosing a random action. The other option is choosing the action which makes the Q value of current state maximum.
- gamma
Discount factor.
- alpha
Learning rate.
- speed
Speed of displaying.
After convegence of training:
Xo_________T X_o________T X__o_______T X___o______T X____o_____T X_____o____T X______o___T X_______o__T X________o_T X_________oT X__________o
The agent can find the treasure directly.
pipenv run python main.py train -d 2d -s
Enjoy the training process.
pipenv run python main.py run -d 2d
Watch the result.
|@| | |+| | | | | | | | |+|X| | | | |+| | | | | |X| | | | | | | | | | | | | |X|X|+| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |X| | |+| | | | | | | | |X|X| |+| | | |+| | | | | | | | | | | | |+| | | |X|#|