|
1 | 1 |
|
| 2 | +<div align="center"> |
| 3 | +<img width="300px" height="auto" src="./docs/.images/xingtian-logo.png"> |
| 4 | +</div> |
| 5 | + |
| 6 | + |
| 7 | +[中文](./README.cn.md) |
| 8 | + |
| 9 | +## Introduction |
| 10 | + |
| 11 | +[](https://opensource.org/licenses/MIT) |
| 12 | + |
| 13 | +**XingTian (刑天)** is a componentized library for the development and verification of reinforcement learning algorithms. It supports multiple algorithms, including DQN, DDPG, PPO, and IMPALA etc, which could training agents in multiple environments, such as Gym, Atari, Torcs, StarCraft and so on. To meet users' requirements for quick verification and solving RL problems, four modules are abstracted: `Algorithm`, `Model`, `Agent`, and `Environment`. They work in a similar way as the combination of `Lego' building blocks. For details about the architecture, please see the [Architecture introduction](./docs/basic_arch.en.md). |
| 14 | + |
| 15 | +## Dependencies |
| 16 | + |
| 17 | +```shell |
| 18 | +# ubuntu 18.04 |
| 19 | +sudo apt-get install python3-pip libopencv-dev redis-server -y |
| 20 | +pip3 install opencv-python |
| 21 | + |
| 22 | +# run with tensorflow 1.15.0 |
| 23 | +pip3 install zmq h5py gym[atari] tqdm imageio matplotlib==3.0.3 Ipython pyyaml tensorflow==1.15.0 pyarrow lz4 fabric2 line_profiler redis absl-py psutil |
| 24 | +``` |
| 25 | + |
| 26 | +or, using `pip3 install -r requirements.txt` |
| 27 | + |
| 28 | +If your want to used PyTorch as the backend, please install it by yourself. [Ref Pytorch](https://pytorch.org/get-started/locally/) |
| 29 | + |
| 30 | + |
| 31 | + |
| 32 | + |
| 33 | +## Installation |
| 34 | +```zsh |
| 35 | +# cd PATH/TO/XingTian |
| 36 | +pip3 install -e . |
| 37 | +``` |
| 38 | + |
| 39 | +After installation, you could use `import xt; print(xt.__Version__)` to check whether the installation is successful. |
| 40 | + |
| 41 | +```python |
| 42 | +In [1]: import xt |
| 43 | + |
| 44 | +In [2]: xt.__version__ |
| 45 | +Out[2]: '0.1.1' |
| 46 | +``` |
| 47 | + |
| 48 | + |
| 49 | + |
| 50 | +## Quick Start |
| 51 | + |
| 52 | +--------- |
| 53 | +#### Setup configuration |
| 54 | +Follow's configuration shows a minimal example with [Cartpole](https://gym.openai.com/envs/CartPole-v0/) environment. |
| 55 | +More detailed description with the parameters of agent, algorithm and environment could been find in the [User guide](./docs/user.en.md) . |
| 56 | + |
| 57 | + |
| 58 | +```yaml |
| 59 | +alg_para: |
| 60 | + alg_name: PPO |
| 61 | +env_para: |
| 62 | + env_name: GymEnv |
| 63 | + env_info: {'name': CartPole-v0, 'vision': False} |
| 64 | +agent_para: |
| 65 | + agent_name: CartpolePpo |
| 66 | + agent_num : 1 |
| 67 | + agent_config: { |
| 68 | + 'max_steps': 200, |
| 69 | + 'complete_step': 500000} |
| 70 | +model_para: |
| 71 | + actor: |
| 72 | + model_name: PpoMlp |
| 73 | + state_dim: [4] |
| 74 | + action_dim: 2 |
| 75 | + summary: True |
| 76 | + |
| 77 | +env_num: 10 |
| 78 | +``` |
| 79 | +
|
| 80 | +In addition, your could find more configuration sets in [examples](./examples) directory. |
| 81 | +
|
| 82 | +#### Start training task |
| 83 | +
|
| 84 | +```python3 xt/main.py -f examples/cartpole_ppo.yaml -t train``` |
| 85 | + |
| 86 | + |
| 87 | + |
| 88 | + |
| 89 | + |
| 90 | +#### Evaluate trained model |
| 91 | + |
| 92 | +Set `test_node_config` and `test_model_path` for evaluation within the `YOUR_CONFIG_FILE.yaml` |
| 93 | + |
| 94 | +```python3 xt/main.py -f examples/cartpole_ppo.yaml -t evaluate``` |
| 95 | + |
| 96 | + |
| 97 | + |
| 98 | +> NOTE: XingTian start with `-t train ` as default. |
| 99 | + |
| 100 | +#### Run with CLI |
| 101 | + |
| 102 | +```zsh |
| 103 | +# Could replace `python3 xt/main.py` with `xt_main` command! |
| 104 | +xt_main -f examples/cartpole_ppo.yaml -t train |
| 105 | +``` |
| 106 | + |
| 107 | +## Develop with Custom case |
| 108 | + |
| 109 | +1. Write custom module, and register it. More detail guidance on custom module can be found in the [Developer Guide](./docs/developer.en.md) |
| 110 | +2. Add YOUR-CUSTOM-MODULE name into `your_train_configure.yaml` |
| 111 | +3. Start training with `xt_main -f path/to/your_train_configure.yaml` :) |
| 112 | + |
| 113 | + |
| 114 | + |
| 115 | +## Reference Results |
| 116 | + |
| 117 | +#### Episode Reward Average |
| 118 | + |
| 119 | +1. **DQN** Reward after 10M time-steps (**40M frames**). |
| 120 | + |
| 121 | +| env | XingTian Basic DQN | RLlib Basic DQN | Hessel et al. DQN | |
| 122 | +| ------------- | ------------------ | --------------- | ----------------- | |
| 123 | +| BeamRider | 6706 | 2869 | ~2000 | |
| 124 | +| Breakout | 352 | 287 | ~150 | |
| 125 | +| QBert | 14087 | 3921 | ~4000 | |
| 126 | +| SpaceInvaders | 947 | 650 | ~500 | |
| 127 | + |
| 128 | +2. **PPO** Reward after 10M time-steps (**40M frames**). |
| 129 | + |
| 130 | +| env | XingTian PPO | RLlib PPO | Baselines PPO | |
| 131 | +| ------------- | ------------ | --------- | ------------- | |
| 132 | +| BeamRider | 4204 | 2807 | ~1800 | |
| 133 | +| Breakout | 243 | 104 | ~250 | |
| 134 | +| QBert | 12288 | 11085 | ~14000 | |
| 135 | +| SpaceInvaders | 1135 | 671 | ~800 | |
| 136 | + |
| 137 | +#### Throughput |
| 138 | + |
| 139 | +1. **DQN** |
| 140 | + |
| 141 | +| env | XingTian Basic DQN | RLlib Basic DQN | |
| 142 | +| ------------- | ------------------ | --------------- | |
| 143 | +| BeamRider | 129 | 109 | |
| 144 | +| Breakout | 117 | 113 | |
| 145 | +| QBert | 111 | 90 | |
| 146 | +| SpaceInvaders | 115 | 100 | |
| 147 | + |
| 148 | +2. **PPO** |
| 149 | + |
| 150 | +| env | XingTian PPO | RLlib PPO | |
| 151 | +| ------------- | ------------ | --------- | |
| 152 | +| BeamRider | 1775 | 1618 | |
| 153 | +| Breakout | 1801 | 1535 | |
| 154 | +| QBert | 1741 | 1617 | |
| 155 | +| SpaceInvaders | 1858 | 1608 | |
| 156 | + |
| 157 | +> Experiment condition: 72 Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz with single Tesla V100 |
| 158 | +
|
| 159 | +## Acknowledgement |
| 160 | + |
| 161 | +XingTian refers to the following projects: [DeepMind/scalable_agent](https://github.com/deepmind/scalable_agent), [baselines](https://github.com/openai/baselines), [ray](https://github.com/ray-project/ray). |
| 162 | + |
| 163 | +## License |
| 164 | + |
| 165 | +The MIT License(MIT) |
| 166 | + |
0 commit comments