Skip to content

Commit ffa862d

Browse files
committed
update 3s5z_vs_3s6z in qmix_large.yaml
update Update README.md add: ablation_study_image Update README.md add weight_decay update update
1 parent dbb929a commit ffa862d

File tree

12 files changed

+59
-9
lines changed

12 files changed

+59
-9
lines changed

README.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,12 +18,13 @@ There are so many code-level tricks in the Multi-agent Reinforcement Learning (
1818
- Reward scaling
1919
- Orthogonal initialization and layer scaling
2020
- **Adam**
21+
- **Neural networks hidden size**
2122
- learning rate annealing
2223
- Reward Clipping
2324
- Observation Normalization
2425
- Gradient Clipping
2526
- **Large Batch Size**
26-
- **N-step Returns(including GAE($\lambda$) and Q($\lambda$))**
27+
- **N-step Returns(including GAE($\lambda$) and Q($\lambda$) ...)**
2728
- **Rollout Process Number**
2829
- **$\epsilon$-greedy annealing steps**
2930
- Death Agent Masking
@@ -50,17 +51,17 @@ Using a few of tricks above (bold texts), we enabled QMIX (qmix.yaml) to solve a
5051
| 2c_vs_64zg | Hard |**100\%**| **100\%** |
5152
| corridor | Super Hard | 0% | **100\%** |
5253
| MMM2 | Super Hard | 98% | **100\%** |
53-
| 3s5z_vs_3s6z | Super Hard | 3% |**85\%**(Number of Envs = 4) |
54+
| 3s5z_vs_3s6z | Super Hard | 3% |**93\%**(hidden_size = 256, qmix_large.yaml) |
5455
| 27m_vs_30m | Super Hard | 56% | **100\%** |
5556
| 6h_vs_8z | Super Hard | 0% | **93\%**($\lambda$ = 0.3) |
5657

5758

5859
## Re-Evaluation
59-
Afterwards, we re-evaluate numerous QMIX variants with normalized the tricks (a **genaral** set of hyperparameters), and find that QMIX achieves the SOTA.
60+
Afterwards, we re-evaluate numerous QMIX variants with normalized the tricks (a **general** set of hyperparameters), and find that QMIX achieves the SOTA.
6061

6162
| Scenarios | Difficulty | Value-based | | | | | Policy-based | | | |
6263
|----------------|----------------|:---------------:|:--------------:|:---------------:|:--------------:|:--------------:|:--------------:|--------|:------:|:--------------:|
63-
| | | QMIX | VDNs | Qatten | QPLEX | WQMIX | LICA | VMIX | DOP | RMC |
64+
| | | QMIX | VDNs | Qatten | QPLEX | WQMIX | LICA | VMIX | DOP | RIIT |
6465
| 2c_vs_64zg | Hard | **100%** | **100%** | **100%** | **100%** | 93% | **100%** | 98% | 84% | **100%** |
6566
| 8m_vs_9m | Hard | **100%** | **100%** | **100%** | 95% | 90% | 48% | 75% | 96% | 95% |
6667
| 3s_vs_5z | Hard | **100%** | **100%** | **100%** | **100%** | **100%** | 3% | 96% | **100%** | 96% |
@@ -102,7 +103,7 @@ Actor Critic Methods:
102103
- [**VMIX**: Value-Decomposition Multi-Agent Actor-Critics](https://arxiv.org/abs/2007.12306)
103104
- [**LICA**: Learning Implicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning](https://arxiv.org/abs/2007.02529)
104105
- [**DOP**: Off-Policy Multi-Agent Decomposed Policy Gradients](https://arxiv.org/abs/2007.12322)
105-
- [**RMC**: Revisiting the Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning](https://arxiv.org/abs/2102.03479)
106+
- [**RIIT**: Rethinking the Implementation Tricks and Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning.](https://arxiv.org/abs/2102.03479)
106107

107108
## Installation instructions
108109

ablation_study/exploration.pdf

23.6 KB
Binary file not shown.

ablation_study/network_size.pdf

22.8 KB
Binary file not shown.

ablation_study/optimizer1.pdf

26.5 KB
Binary file not shown.

ablation_study/optimizer2.pdf

25.5 KB
Binary file not shown.

ablation_study/process_number.pdf

22.2 KB
Binary file not shown.

ablation_study/replay_buffer.pdf

26.2 KB
Binary file not shown.

ablation_study/td_lambda.pdf

30.1 KB
Binary file not shown.

src/config/algs/qmix_large.yaml

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# --- QMIX specific parameters with large networks ---
2+
# for 3s5z_vs_3s6z
3+
4+
# use epsilon greedy action selector
5+
action_selector: "epsilon_greedy"
6+
epsilon_start: 1.0
7+
epsilon_finish: 0.05
8+
epsilon_anneal_time: 100000
9+
10+
runner: "parallel"
11+
batch_size_run: 8
12+
buffer_size: 5000
13+
batch_size: 128
14+
optimizer: 'adam'
15+
16+
t_max: 10050000
17+
18+
# update the target network every {} episodes
19+
target_update_interval: 200
20+
21+
# use the Q_Learner to train
22+
mac: "n_mac"
23+
agent: "n_rnn"
24+
agent_output_type: q
25+
rnn_hidden_dim: 256
26+
27+
learner: "nq_learner"
28+
mixer: "qmix"
29+
mixing_embed_dim: 64
30+
hypernet_embed: 256
31+
lr: 0.001 # Learning rate for agents
32+
td_lambda: 0.6
33+
optimizer: 'adam'
34+
weight_decay: 0
35+
36+
# rnn layer normalization
37+
use_layer_norm: False
38+
39+
# orthogonal init for DNN
40+
use_orthogonal: False
41+
gain: 0.01
42+
43+
# Priority experience replay
44+
use_per: False
45+
per_alpha: 0.6
46+
per_beta: 0.4
47+
return_priority: False
48+
49+
name: "qmix_large_env=8_adam_td_lambda"

src/config/algs/riit.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,4 +39,4 @@ entropy_coef: 0.03
3939
optimizer: 'adam'
4040
abs: True # monotonicity condition
4141

42-
name: "rmc_env=8_adam_td_lambda"
42+
name: "riit_env=8_adam_td_lambda"

0 commit comments

Comments
 (0)