Skip to content

Commit a21387d

Browse files
committed
update qmix_att.yaml
1 parent 4fe05d4 commit a21387d

File tree

12 files changed

+1697
-11
lines changed

12 files changed

+1697
-11
lines changed

README.md

Lines changed: 26 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,13 @@
1+
12
# RMC
23
Open-source code for [Revisiting the Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning](https://arxiv.org/abs/2102.03479).
34

45
This repository is fine-tuned for StarCraft Multi-agent Challenge (SMAC). For other multi-agent tasks, we also recommend an optimized implementation of QMIX: https://github.com/marlbenchmark/off-policy.
56

6-
7-
## Code-level Optimizations
7+
```
8+
2021.10.4 update: add QMIX with attention (qmix_att.yaml) as a baseline for Communication tasks.
9+
```
10+
## Finetuned-QMIX
811
There are so many code-level tricks in the Multi-agent Reinforcement Learning (MARL), such as:
912
- Value function clipping (clip max Q values for QMIX)
1013
- Value Normalization
@@ -26,8 +29,7 @@ There are so many code-level tricks in the Multi-agent Reinforcement Learning (
2629
- What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study
2730
- The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games
2831

29-
### Finetuned-QMIX
30-
Using a few of tricks above (bold texts), we enabled QMIX to solve almost all hard scenarios of SMAC (fine-tuned QMIX for each scenarios).
32+
Using a few of tricks above (bold texts), we enabled QMIX to solve almost all hard scenarios of SMAC (Fine-tuned hyperparameters for each scenarios). (StarCraft 2 version: SC2.4.10)
3133

3234

3335
| Senarios | Difficulty | QMIX (batch_size=128) | Finetuned-QMIX |
@@ -50,7 +52,7 @@ Using a few of tricks above (bold texts), we enabled QMIX to solve almost all ha
5052

5153

5254
## Re-Evaluation
53-
Afterwards, we re-evaluate numerous QMIX variants with normalized the tricks (a **genaral** set of hyperparameters), and find that QMIX achieves the SOTA (StarCraft 2, SC2.4.10).
55+
Afterwards, we re-evaluate numerous QMIX variants with normalized the tricks (a **genaral** set of hyperparameters), and find that QMIX achieves the SOTA (StarCraft 2 version: SC2.4.10).
5456

5557
| Scenarios | Difficulty | Value-based | | | | | Policy-based | | | |
5658
|----------------|----------------|:---------------:|:--------------:|:---------------:|:--------------:|:--------------:|:--------------:|--------|:------:|:--------------:|
@@ -67,7 +69,16 @@ Afterwards, we re-evaluate numerous QMIX variants with normalized the tricks (a
6769
| Discrete PP | - | **40** | 39 | - | 39 | 39 | 30 | 39 | 32 | 38 |
6870
| Avg. Score | Hard+ | **94.9%** | 91.2% | 92.7% | 92.5% | 67.4% | 29.2% | 67.4% | 44.1% | 84.0% |
6971

70-
## PyMARL
72+
## Communication
73+
We also tested our QMIX-with-attention (qmix_att.yaml, $\lambda=0.3$, attention\_heads=4) on some maps (fron [NDQ](https://github.com/TonghanWang/NDQ)) that require communication (StarCraft 2 version: SC2.4.10).
74+
75+
| Senarios | Difficulty | QMIX (batch_size=128, No Communication) | QMIX-with-attention ( Communication) |
76+
|----------------|:----------:|:--------------:|:----------------------------------:|
77+
| 1o_10b_vs_1r (200w steps) | - | 56% |**87\%** |
78+
| 1o_2r_vs_4r (200w steps) | - | 50% | **95\%** |
79+
| bane_vs_hM | - | 0% | **0\%** |
80+
81+
# Usage
7182

7283
PyMARL is [WhiRL](http://whirl.cs.ox.ac.uk)'s framework for deep multi-agent reinforcement learning and includes implementations of the following algorithms:
7384

@@ -89,7 +100,7 @@ Actor Critic Methods:
89100
- [**DOP**: Off-Policy Multi-Agent Decomposed Policy Gradients](https://arxiv.org/abs/2007.12322)
90101
- [**RMC**: Revisiting the Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning](https://arxiv.org/abs/2102.03479)
91102

92-
### Installation instructions
103+
## Installation instructions
93104

94105
Install Python packages
95106
```shell
@@ -104,7 +115,7 @@ bash install_sc2.sh
104115

105116
This will download SC2.4.10 into the 3rdparty folder and copy the maps necessary to run over.
106117

107-
### Command Line Tool
118+
## Command Line Tool
108119

109120
**Run an experiment**
110121

@@ -118,6 +129,12 @@ python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=corrid
118129
python3 src/main.py --config=qmix_predator_prey --env-config=stag_hunt with env_args.map_name=stag_hunt
119130
```
120131

132+
```shell
133+
# For Communication tasks
134+
python3 src/main.py --config=qmix_att --env-config=sc2 with env_args.map_name=1o_10b_vs_1r
135+
```
136+
137+
121138
The config files act as defaults for an algorithm or environment.
122139

123140
They are all located in `src/config`.
@@ -142,7 +159,7 @@ All results will be stored in the `Results` folder and named with `map_name`.
142159
bash clean.sh
143160
```
144161

145-
## Cite
162+
# Cite
146163
```
147164
@article{hu2021revisiting,
148165
title={Revisiting the Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning},

install_sc2.sh

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
#!/bin/bash
22
# Install SC2 and add the custom maps
33

4+
smac_mpas=$(pwd)/smac_maps
5+
46
cd "$HOME"
57
export SC2PATH="$HOME/StarCraftII"
68
echo 'SC2PATH is set to '$SC2PATH
@@ -25,7 +27,10 @@ fi
2527
cd ..
2628
wget https://github.com/oxwhirl/smac/releases/download/v0.1-beta1/SMAC_Maps.zip
2729
unzip SMAC_Maps.zip
30+
31+
cp -r "$smac_mpas/*.SC2Map" ./SMAC_Maps
2832
mv SMAC_Maps $MAP_DIR
2933
rm -rf SMAC_Maps.zip
3034

35+
3136
echo 'StarCraft II and SMAC are installed.'

smac_maps/1o_10b_vs_1r.SC2Map

32.6 KB
Binary file not shown.

smac_maps/1o_2r_vs_4r.SC2Map

21.1 KB
Binary file not shown.

smac_maps/bane_vs_hM.SC2Map

17.9 KB
Binary file not shown.

src/config/algs/qmix_att.yaml

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# --- QMIX specific parameters ---
2+
3+
# use epsilon greedy action selector
4+
action_selector: "epsilon_greedy"
5+
epsilon_start: 1.0
6+
epsilon_finish: 0.05
7+
epsilon_anneal_time: 100000 # 500000 for 6h_vs_8z
8+
9+
runner: "parallel"
10+
batch_size_run: 8
11+
buffer_size: 5000
12+
batch_size: 128
13+
optimizer: 'adam'
14+
15+
t_max: 10050000
16+
17+
# update the target network every {} episodes
18+
target_update_interval: 200
19+
20+
# use the Q_Learner to train
21+
mac: "n_mac"
22+
agent: "att_rnn" # self-attention for communication
23+
agent_output_type: q
24+
att_heads: 4
25+
att_embed_dim: 32
26+
27+
learner: "nq_learner"
28+
mixer: "qmix"
29+
mixing_embed_dim: 32
30+
hypernet_embed: 64
31+
lr: 0.001 # Learning rate for agents
32+
td_lambda: 0.3
33+
optimizer: 'adam'
34+
35+
grad_norm_clip: 20.0
36+
37+
name: "qmix_att_env=8_adam_td_lambda"

src/envs/__init__.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,9 @@
22
import sys
33
import os
44

5-
from smac.env import MultiAgentEnv, StarCraft2Env
5+
from .multiagentenv import MultiAgentEnv
6+
7+
from .starcraft import StarCraft2Env
68
from .one_step_matrix_game import OneStepMatrixGame
79
from .stag_hunt import StagHunt
810

0 commit comments

Comments
 (0)