Skip to content

Commit 233693f

Browse files
committed
0.1.1 release
1 parent 746f075 commit 233693f

File tree

152 files changed

+11918
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

152 files changed

+11918
-0
lines changed

LICENSE

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (C) 2020. Huawei Technologies Co., Ltd. All rights reserved.
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in
13+
all copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21+
THE SOFTWARE.

README.cn.md

+159
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
2+
<div align="center">
3+
<img width="300px" height="auto" src="./docs/.images/xingtian-logo.png">
4+
</div>
5+
6+
[English](./README.md)
7+
8+
## 简介
9+
10+
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
11+
12+
**刑天 (XingTian)** 是一个组件化强化学习库,用于开发、验证强化学习算法。它目前已支持包括DQN、DDPG、PPO和IMPALA等系列算法,可以在多种环境中训练智能体,如Gym、Atari、Torcs、StarCraft等。 为了满足用户快速验证和解决RL问题的需求,刑天抽象出了四个模块:`Algorithm`,`Model`,`Agent`,`Environment`。它们的工作方式类似于"乐高"积木的组合。更详细的内容请[阅读架构介绍](./docs/basic_arch.cn.md).
13+
14+
## 系统依赖
15+
16+
```shell
17+
# ubuntu 18.04
18+
sudo apt-get install python3-pip libopencv-dev redis-server -y
19+
pip3 install opencv-python
20+
21+
# run with tensorflow 1.15.0
22+
pip3 install zmq h5py gym[atari] tqdm imageio matplotlib==3.0.3 Ipython pyyaml tensorflow==1.15.0 pyarrow lz4 fabric2 line_profiler redis absl-py psutil
23+
```
24+
25+
也可使用pip 进行依赖安装 `pip3 install -r requirements.txt`
26+
27+
如果需要使用Pytorch 作为后端引擎,请自行安装. [Ref Pytorch](https://pytorch.org/get-started/locally/)
28+
29+
30+
31+
32+
## 安装
33+
```zsh
34+
# cd PATH/TO/XingTian
35+
pip3 install -e .
36+
```
37+
38+
可通过 `import xt; print(xt.__Version__)` 来确认是否已正常安装.
39+
40+
```python
41+
In [1]: import xt
42+
43+
In [2]: xt.__version__
44+
Out[2]: '0.1.1'
45+
```
46+
47+
48+
49+
## 快速开始
50+
51+
---------
52+
#### 参数配置
53+
下面是一个有关 [倒立摆](https://gym.openai.com/envs/CartPole-v0/) 简单任务的参数示例,我们通过配置系统中已注册的算法,环境信息来组合训练任务。有关不同参数更详细的描述可以在[用户指导](./docs/user.cn.md) 中找到。
54+
55+
56+
```yaml
57+
alg_para:
58+
alg_name: PPO
59+
env_para:
60+
env_name: GymEnv
61+
env_info: {'name': CartPole-v0, 'vision': False}
62+
agent_para:
63+
agent_name: CartpolePpo
64+
agent_num : 1
65+
agent_config: {
66+
'max_steps': 200,
67+
'complete_step': 500000}
68+
model_para:
69+
actor:
70+
model_name: PpoMlp
71+
state_dim: [4]
72+
action_dim: 2
73+
summary: True
74+
75+
env_num: 10
76+
```
77+
78+
另外在 [examples](./examples) 目录下,可以找到更加丰富的训练配置示例。
79+
80+
#### 开始训练任务
81+
82+
```python3 xt/main.py -f examples/cartpole_ppo.yaml -t train```
83+
84+
![img](./docs/.images/cartpole.gif)
85+
86+
87+
88+
#### 评估模型
89+
90+
评估任务中,在你的`.yaml`文件中,需要设置 `test_node_config` 和 `test_model_path` 参数,然后通过 `-t evaluate` 运行评估任务。
91+
92+
```python3 xt/main.py -f examples/cartpole_ppo.yaml -t evaluate```
93+
94+
> 系统默认启动训练任务,即 -t 的默认选项是 train
95+
96+
#### 使用命令行
97+
98+
```zsh
99+
# 在终端中,可直接使用xt_main 替换 python3 xt/main.py 执行命令
100+
xt_main -f examples/cartpole_ppo.yaml -t train
101+
```
102+
103+
## 自定义任务的开发
104+
105+
1. 编写自定义模块,并注册。 具体可参考 [开发指导](./docs/developer.cn.md)
106+
2. 在配置文件 `your_train_configure.yaml`中,配置自定义的模块名字
107+
3. 启动训练 `xt_main -f path/to/your_train_configure.yaml` :)
108+
109+
## 实验结果参考
110+
111+
#### 平均的训练回报
112+
113+
1. 10M step 之后的**DQN** 收敛回报 (**40M frames**).
114+
115+
| env | XingTian Basic DQN | RLlib Basic DQN | Hessel et al. DQN |
116+
| ------------- | ------------------ | --------------- | ----------------- |
117+
| BeamRider | 6706 | 2869 | ~2000 |
118+
| Breakout | 352 | 287 | ~150 |
119+
| QBert | 14087 | 3921 | ~4000 |
120+
| SpaceInvaders | 947 | 650 | ~500 |
121+
122+
2. 10M step 之后的**PPO** 收敛回报 (**40M frames**).
123+
124+
| env | XingTian PPO | RLlib PPO | Baselines PPO |
125+
| ------------- | ------------ | --------- | ------------- |
126+
| BeamRider | 4204 | 2807 | ~1800 |
127+
| Breakout | 243 | 104 | ~250 |
128+
| QBert | 12288 | 11085 | ~14000 |
129+
| SpaceInvaders | 1135 | 671 | ~800 |
130+
131+
#### 吞吐量
132+
133+
1. **DQN**
134+
135+
| env | XingTian Basic DQN | RLlib Basic DQN |
136+
| ------------- | ------------------ | --------------- |
137+
| BeamRider | 129 | 109 |
138+
| Breakout | 117 | 113 |
139+
| QBert | 111 | 90 |
140+
| SpaceInvaders | 115 | 100 |
141+
142+
2. **PPO**
143+
144+
| env | XingTian PPO | RLlib PPO |
145+
| ------------- | ------------ | --------- |
146+
| BeamRider | 1775 | 1618 |
147+
| Breakout | 1801 | 1535 |
148+
| QBert | 1741 | 1617 |
149+
| SpaceInvaders | 1858 | 1608 |
150+
151+
> 实验硬件环境: 72 Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz with single Tesla V100
152+
153+
## 致谢
154+
155+
刑天参考了以下项目: [DeepMind/scalable_agent](https://github.com/deepmind/scalable_agent), [baselines](https://github.com/openai/baselines), [ray](https://github.com/ray-project/ray).
156+
157+
## 许可证
158+
159+
The MIT License(MIT)

README.md

+165
Original file line numberDiff line numberDiff line change
@@ -1 +1,166 @@
11

2+
<div align="center">
3+
<img width="300px" height="auto" src="./docs/.images/xingtian-logo.png">
4+
</div>
5+
6+
7+
[中文](./README.cn.md)
8+
9+
## Introduction
10+
11+
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
12+
13+
**XingTian (刑天)** is a componentized library for the development and verification of reinforcement learning algorithms. It supports multiple algorithms, including DQN, DDPG, PPO, and IMPALA etc, which could training agents in multiple environments, such as Gym, Atari, Torcs, StarCraft and so on. To meet users' requirements for quick verification and solving RL problems, four modules are abstracted: `Algorithm`, `Model`, `Agent`, and `Environment`. They work in a similar way as the combination of `Lego' building blocks. For details about the architecture, please see the [Architecture introduction](./docs/basic_arch.en.md).
14+
15+
## Dependencies
16+
17+
```shell
18+
# ubuntu 18.04
19+
sudo apt-get install python3-pip libopencv-dev redis-server -y
20+
pip3 install opencv-python
21+
22+
# run with tensorflow 1.15.0
23+
pip3 install zmq h5py gym[atari] tqdm imageio matplotlib==3.0.3 Ipython pyyaml tensorflow==1.15.0 pyarrow lz4 fabric2 line_profiler redis absl-py psutil
24+
```
25+
26+
or, using `pip3 install -r requirements.txt`
27+
28+
If your want to used PyTorch as the backend, please install it by yourself. [Ref Pytorch](https://pytorch.org/get-started/locally/)
29+
30+
31+
32+
33+
## Installation
34+
```zsh
35+
# cd PATH/TO/XingTian
36+
pip3 install -e .
37+
```
38+
39+
After installation, you could use `import xt; print(xt.__Version__)` to check whether the installation is successful.
40+
41+
```python
42+
In [1]: import xt
43+
44+
In [2]: xt.__version__
45+
Out[2]: '0.1.1'
46+
```
47+
48+
49+
50+
## Quick Start
51+
52+
---------
53+
#### Setup configuration
54+
Follow's configuration shows a minimal example with [Cartpole](https://gym.openai.com/envs/CartPole-v0/) environment.
55+
More detailed description with the parameters of agent, algorithm and environment could been find in the [User guide](./docs/user.en.md) .
56+
57+
58+
```yaml
59+
alg_para:
60+
alg_name: PPO
61+
env_para:
62+
env_name: GymEnv
63+
env_info: {'name': CartPole-v0, 'vision': False}
64+
agent_para:
65+
agent_name: CartpolePpo
66+
agent_num : 1
67+
agent_config: {
68+
'max_steps': 200,
69+
'complete_step': 500000}
70+
model_para:
71+
actor:
72+
model_name: PpoMlp
73+
state_dim: [4]
74+
action_dim: 2
75+
summary: True
76+
77+
env_num: 10
78+
```
79+
80+
In addition, your could find more configuration sets in [examples](./examples) directory.
81+
82+
#### Start training task
83+
84+
```python3 xt/main.py -f examples/cartpole_ppo.yaml -t train```
85+
86+
![img](./docs/.images/cartpole.gif)
87+
88+
89+
90+
#### Evaluate trained model
91+
92+
Set `test_node_config` and `test_model_path` for evaluation within the `YOUR_CONFIG_FILE.yaml`
93+
94+
```python3 xt/main.py -f examples/cartpole_ppo.yaml -t evaluate```
95+
96+
97+
98+
> NOTE: XingTian start with `-t train ` as default.
99+
100+
#### Run with CLI
101+
102+
```zsh
103+
# Could replace `python3 xt/main.py` with `xt_main` command!
104+
xt_main -f examples/cartpole_ppo.yaml -t train
105+
```
106+
107+
## Develop with Custom case
108+
109+
1. Write custom module, and register it. More detail guidance on custom module can be found in the [Developer Guide](./docs/developer.en.md)
110+
2. Add YOUR-CUSTOM-MODULE name into `your_train_configure.yaml`
111+
3. Start training with `xt_main -f path/to/your_train_configure.yaml` :)
112+
113+
114+
115+
## Reference Results
116+
117+
#### Episode Reward Average
118+
119+
1. **DQN** Reward after 10M time-steps (**40M frames**).
120+
121+
| env | XingTian Basic DQN | RLlib Basic DQN | Hessel et al. DQN |
122+
| ------------- | ------------------ | --------------- | ----------------- |
123+
| BeamRider | 6706 | 2869 | ~2000 |
124+
| Breakout | 352 | 287 | ~150 |
125+
| QBert | 14087 | 3921 | ~4000 |
126+
| SpaceInvaders | 947 | 650 | ~500 |
127+
128+
2. **PPO** Reward after 10M time-steps (**40M frames**).
129+
130+
| env | XingTian PPO | RLlib PPO | Baselines PPO |
131+
| ------------- | ------------ | --------- | ------------- |
132+
| BeamRider | 4204 | 2807 | ~1800 |
133+
| Breakout | 243 | 104 | ~250 |
134+
| QBert | 12288 | 11085 | ~14000 |
135+
| SpaceInvaders | 1135 | 671 | ~800 |
136+
137+
#### Throughput
138+
139+
1. **DQN**
140+
141+
| env | XingTian Basic DQN | RLlib Basic DQN |
142+
| ------------- | ------------------ | --------------- |
143+
| BeamRider | 129 | 109 |
144+
| Breakout | 117 | 113 |
145+
| QBert | 111 | 90 |
146+
| SpaceInvaders | 115 | 100 |
147+
148+
2. **PPO**
149+
150+
| env | XingTian PPO | RLlib PPO |
151+
| ------------- | ------------ | --------- |
152+
| BeamRider | 1775 | 1618 |
153+
| Breakout | 1801 | 1535 |
154+
| QBert | 1741 | 1617 |
155+
| SpaceInvaders | 1858 | 1608 |
156+
157+
> Experiment condition: 72 Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz with single Tesla V100
158+
159+
## Acknowledgement
160+
161+
XingTian refers to the following projects: [DeepMind/scalable_agent](https://github.com/deepmind/scalable_agent), [baselines](https://github.com/openai/baselines), [ray](https://github.com/ray-project/ray).
162+
163+
## License
164+
165+
The MIT License(MIT)
166+

0 commit comments

Comments
 (0)