Skip to content

Commit cef6b37

Browse files
committed
xt release 0.3.0
1 parent a9bdde7 commit cef6b37

File tree

384 files changed

+19065
-4085
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

384 files changed

+19065
-4085
lines changed

README.cn.md

Lines changed: 99 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -5,21 +5,21 @@
55

66
[English](./README.md)
77

8-
## 简介
8+
## 简介
99

1010
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
1111

12-
**刑天 (XingTian)** 是一个组件化强化学习库,用于开发、验证强化学习算法。它目前已支持包括DQN、DDPG、PPO和IMPALA等系列算法,可以在多种环境中训练智能体,如Gym、Atari、Torcs、StarCraft等。 为了满足用户快速验证和解决RL问题的需求,刑天抽象出了四个模块:`Algorithm`,`Model`,`Agent`,`Environment`。它们的工作方式类似于"乐高"积木的组合。更详细的内容请[阅读架构介绍](./docs/basic_arch.cn.md). 有关使用的问题,欢迎大家提交issue或者加入我们的QQ群(833345709)进行讨论
12+
**刑天 (XingTian)** 是一个组件化强化学习库,用于开发、验证强化学习算法。它目前已支持包括DQN、DDPG、PPO和IMPALA等系列算法,可以在多种环境中训练智能体,如Gym、Atari、Torcs、StarCraftII等。 为了满足用户快速验证和解决RL问题的需求,刑天抽象出了四个模块:`Algorithm`,`Model`,`Agent`,`Environment`。它们的工作方式类似于"乐高"积木的组合。更详细的内容请[阅读架构介绍](./docs/basic_arch.cn.md).
1313

1414
## 系统依赖
1515

1616
```shell
1717
# ubuntu 18.04
18-
sudo apt-get install python3-pip libopencv-dev redis-server -y
18+
sudo apt-get install python3-pip libopencv-dev -y
1919
pip3 install opencv-python
2020

21-
# run with tensorflow 1.15.0
22-
pip3 install zmq h5py gym[atari] tqdm imageio matplotlib==3.0.3 Ipython pyyaml tensorflow==1.15.0 pyarrow lz4 fabric2 line_profiler redis absl-py psutil
21+
# Run with tensorflow 1.15.0 or tensorflow 2.3.1
22+
pip3 install zmq h5py gym[atari] tqdm imageio matplotlib==3.0.3 Ipython pyyaml tensorflow==1.15.0 pyarrow lz4 fabric2 absl-py psutil tensorboardX setproctitle
2323
```
2424

2525
也可使用pip 进行依赖安装 `pip3 install -r requirements.txt`
@@ -31,17 +31,17 @@ pip3 install zmq h5py gym[atari] tqdm imageio matplotlib==3.0.3 Ipython pyyaml t
3131

3232
## 安装
3333
```zsh
34-
# cd PATH/TO/XingTian
34+
# cd PATH/TO/XingTian
3535
pip3 install -e .
3636
```
3737

38-
可通过 `import xt; print(xt.__Version__)` 来确认是否已正常安装.
38+
可通过 `import xt; print(xt.__Version__)` 来确认是否已正常安装.
3939

4040
```python
4141
In [1]: import xt
4242

4343
In [2]: xt.__version__
44-
Out[2]: '0.2.0'
44+
Out[2]: '0.3.0'
4545
```
4646

4747

@@ -56,21 +56,43 @@ Out[2]: '0.2.0'
5656
```yaml
5757
alg_para:
5858
alg_name: PPO
59+
alg_config:
60+
process_num: 1
61+
save_model: True # default False
62+
save_interval: 100
63+
5964
env_para:
6065
env_name: GymEnv
61-
env_info: {'name': CartPole-v0, 'vision': False}
66+
env_info:
67+
name: CartPole-v0
68+
vision: False
69+
6270
agent_para:
63-
agent_name: CartpolePpo
71+
agent_name: PPO
6472
agent_num : 1
65-
agent_config: {
66-
'max_steps': 200,
67-
'complete_step': 500000}
73+
agent_config:
74+
max_steps: 200
75+
complete_step: 1000000
76+
complete_episode: 3550
77+
6878
model_para:
6979
actor:
70-
model_name: PpoMlp
80+
model_name: PpoMlp
7181
state_dim: [4]
7282
action_dim: 2
73-
summary: True
83+
input_dtype: float32
84+
model_config:
85+
BATCH_SIZE: 200
86+
CRITIC_LOSS_COEF: 1.0
87+
ENTROPY_LOSS: 0.01
88+
LR: 0.0003
89+
LOSS_CLIPPING: 0.2
90+
MAX_GRAD_NORM: 5.0
91+
NUM_SGD_ITER: 8
92+
SUMMARY: False
93+
VF_SHARE_LAYERS: False
94+
activation: tanh
95+
hidden_sizes: [64, 64]
7496

7597
env_num: 10
7698
```
@@ -85,11 +107,20 @@ env_num: 10
85107

86108

87109

88-
#### 评估模型
110+
#### 评估本机模型
89111

90-
评估任务中,在你的`.yaml`文件中,需要设置 `test_node_config` 和 `test_model_path` 参数,然后通过 `-t evaluate` 运行评估任务。
112+
在你的`.yaml`文件中设置 `benchmark.eval.model_path` 参数,然后通过 `-t evaluate` 运行评估任务。
91113

92-
```python3 xt/main.py -f examples/cartpole_ppo.yaml -t evaluate```
114+
```
115+
benchmark:
116+
eval:
117+
model_path: /YOUR/PATH/TO/EVAL/models
118+
gap: 10 # 目录下需评估模型的间隔
119+
evaluator_num: 1 # 启动评估实例的数量,可支持并行评估
120+
121+
# 运行命令
122+
python3 xt/main.py -f examples/cartpole_ppo.yaml -t evaluate
123+
```
93124
94125
> 系统默认启动训练任务,即 -t 的默认选项是 train
95126
@@ -98,6 +129,9 @@ env_num: 10
98129
```zsh
99130
# 在终端中,可直接使用xt_main 替换 python3 xt/main.py 执行命令
100131
xt_main -f examples/cartpole_ppo.yaml -t train
132+
133+
# train with evaluate
134+
xt_main -f examples/cartpole_ppo.yaml -t train_with_evaluate
101135
```
102136

103137
## 自定义任务的开发
@@ -112,43 +146,66 @@ xt_main -f examples/cartpole_ppo.yaml -t train
112146

113147
1. 10M step 之后的**DQN** 收敛回报 (**40M frames**).
114148

115-
| env | XingTian Basic DQN | RLlib Basic DQN | Hessel et al. DQN |
116-
| ------------- | ------------------ | --------------- | ----------------- |
117-
| BeamRider | 6706 | 2869 | ~2000 |
118-
| Breakout | 352 | 287 | ~150 |
119-
| QBert | 14087 | 3921 | ~4000 |
120-
| SpaceInvaders | 947 | 650 | ~500 |
149+
| env | XingTian Basic DQN | RLlib Basic DQN | Hessel et al. DQN |
150+
| ------------- | ------------------ | --------------- | ----------------- |
151+
| BeamRider | 6706 | 2869 | ~2000 |
152+
| Breakout | 352 | 287 | ~150 |
153+
| QBert | 14087 | 3921 | ~4000 |
154+
| SpaceInvaders | 947 | 650 | ~500 |
121155

122156
2. 10M step 之后的**PPO** 收敛回报 (**40M frames**).
123157

124-
| env | XingTian PPO | RLlib PPO | Baselines PPO |
125-
| ------------- | ------------ | --------- | ------------- |
126-
| BeamRider | 4204 | 2807 | ~1800 |
127-
| Breakout | 243 | 104 | ~250 |
128-
| QBert | 12288 | 11085 | ~14000 |
129-
| SpaceInvaders | 1135 | 671 | ~800 |
158+
| env | XingTian PPO | RLlib PPO | Baselines PPO |
159+
| ------------- | ------------ | --------- | ------------- |
160+
| BeamRider | 4877 | 2807 | ~1800 |
161+
| Breakout | 341 | 104 | ~250 |
162+
| QBert | 14771 | 11085 | ~14000 |
163+
| SpaceInvaders | 1025 | 671 | ~800 |
164+
165+
3. 10M step 之后的**IMPALA** 收敛回报 (**40M frames**).
166+
167+
| env | XingTian IMPALA | RLlib IMPALA |
168+
| ------------- | --------------- | ------------ |
169+
| BeamRider | 2313 | 2071 |
170+
| Breakout | 334 | 385 |
171+
| QBert | 12205 | 4068 |
172+
| SpaceInvaders | 742 | 719 |
173+
174+
130175

131176
#### 吞吐量
132177

133178
1. **DQN**
134179

135-
| env | XingTian Basic DQN | RLlib Basic DQN |
136-
| ------------- | ------------------ | --------------- |
137-
| BeamRider | 129 | 109 |
138-
| Breakout | 117 | 113 |
139-
| QBert | 111 | 90 |
140-
| SpaceInvaders | 115 | 100 |
180+
| env | XingTian Basic DQN | RLlib Basic DQN |
181+
| ------------- | ------------------ | --------------- |
182+
| BeamRider | 129 | 109 |
183+
| Breakout | 117 | 113 |
184+
| QBert | 111 | 90 |
185+
| SpaceInvaders | 115 | 100 |
186+
141187

142188
2. **PPO**
143189

144-
| env | XingTian PPO | RLlib PPO |
145-
| ------------- | ------------ | --------- |
146-
| BeamRider | 1994 | 1618 |
147-
| Breakout | 2033 | 1535 |
148-
| QBert | 2086 | 1617 |
149-
| SpaceInvaders | 2037 | 1608 |
190+
| env | XingTian PPO | RLlib PPO |
191+
| ------------- | ------------ | --------- |
192+
| BeamRider | 2422 | 1618 |
193+
| Breakout | 2497 | 1535 |
194+
| QBert | 2436 | 1617 |
195+
| SpaceInvaders | 2438 | 1608 |
196+
197+
3. **IMPALA**
198+
199+
| env | XingTian IMPALA | RLlib IMPALA |
200+
| ------------- | --------------- | ------------ |
201+
| BeamRider | 8756 | 3637 |
202+
| Breakout | 8814 | 3525 |
203+
| QBert | 8249 | 3471 |
204+
| SpaceInvaders | 8463 | 3555 |
150205

151206
> 实验硬件环境: 72 Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz with single Tesla V100
207+
>
208+
> Ray reward数据来自 [https://github.com/ray-project/rl-experiments](https://github.com/ray-project/rl-experiments), 吞吐量来自以上硬件设备的测试数据
152209
153210
## 致谢
154211

0 commit comments

Comments
 (0)