hijkzzz
diff --git a/‎README.md
Lines changed: 6 additions & 5 deletions b/‎README.md
Lines changed: 6 additions & 5 deletions
diff --git a/‎ablation_study/exploration.pdf
23.6 KB b/‎ablation_study/exploration.pdf
23.6 KB
diff --git a/‎ablation_study/network_size.pdf
22.8 KB b/‎ablation_study/network_size.pdf
22.8 KB
diff --git a/‎ablation_study/optimizer1.pdf
26.5 KB b/‎ablation_study/optimizer1.pdf
26.5 KB
diff --git a/‎ablation_study/optimizer2.pdf
25.5 KB b/‎ablation_study/optimizer2.pdf
25.5 KB
diff --git a/‎ablation_study/process_number.pdf
22.2 KB b/‎ablation_study/process_number.pdf
22.2 KB
diff --git a/‎ablation_study/replay_buffer.pdf
26.2 KB b/‎ablation_study/replay_buffer.pdf
26.2 KB
diff --git a/‎ablation_study/td_lambda.pdf
30.1 KB b/‎ablation_study/td_lambda.pdf
30.1 KB
diff --git a/‎src/config/algs/qmix_large.yaml
Lines changed: 49 additions & 0 deletions b/‎src/config/algs/qmix_large.yaml
Lines changed: 49 additions & 0 deletions
diff --git a/‎src/config/algs/riit.yaml
Lines changed: 1 addition & 1 deletion b/‎src/config/algs/riit.yaml
Lines changed: 1 addition & 1 deletion
@@ -18,12 +18,13 @@ There are so many code-level tricks in the  Multi-agent Reinforcement Learning (
 - Reward scaling
 - Orthogonal initialization and layer scaling
 - **Adam** 
+- **Neural networks hidden size**
 - learning rate annealing
 - Reward Clipping
 - Observation Normalization
 - Gradient Clipping
 - **Large Batch Size**
-- **N-step Returns(including GAE($\lambda$) and Q($\lambda$))**
+- **N-step Returns(including GAE($\lambda$) and Q($\lambda$) ...)**
 - **Rollout Process Number**
 - **$\epsilon$-greedy annealing steps**
 - Death Agent Masking
@@ -50,17 +51,17 @@ Using a few of tricks above (bold texts), we enabled QMIX (qmix.yaml) to solve a
 | 2c_vs_64zg   |    Hard    |**100\%**|          **100\%**          |
 | corridor       | Super Hard |       0%      |          **100\%**          |
 | MMM2           | Super Hard |      98%      |          **100\%**          |
-| 3s5z_vs_3s6z | Super Hard |       3%      |**85\%**(Number of Envs = 4) |
+| 3s5z_vs_3s6z | Super Hard |       3%      |**93\%**(hidden_size = 256, qmix_large.yaml) |
 | 27m_vs_30m   | Super Hard |      56%      |          **100\%**          |
 | 6h_vs_8z     | Super Hard |       0%      |  **93\%**($\lambda$ = 0.3)  |
 
 
 ## Re-Evaluation
-Afterwards, we re-evaluate numerous QMIX variants with normalized the tricks (a **genaral** set of hyperparameters), and find that QMIX achieves the SOTA. 
+Afterwards, we re-evaluate numerous QMIX variants with normalized the tricks (a **general** set of hyperparameters), and find that QMIX achieves the SOTA. 
 
 | Scenarios      | Difficulty     |   Value-based   |                |                 |                |                |  Policy-based  |        |        |                |
 |----------------|----------------|:---------------:|:--------------:|:---------------:|:--------------:|:--------------:|:--------------:|--------|:------:|:--------------:|
-|                |                |       QMIX      |      VDNs      |      Qatten     |      QPLEX     |      WQMIX     |      LICA      |  VMIX  |   DOP  |       RMC      |
+|                |                |       QMIX      |      VDNs      |      Qatten     |      QPLEX     |      WQMIX     |      LICA      |  VMIX  |   DOP  |       RIIT      |
 | 2c_vs_64zg   | Hard           |  **100%** | **100%** |  **100%** | **100%** |      93%      | **100%** |  98%  |  84%  | **100%** |
 | 8m_vs_9m     | Hard           |  **100%** | **100%** |  **100%** |      95%      |      90%      |      48%      |  75%  |  96%  |      95%      |
 | 3s_vs_5z     | Hard           |  **100%** | **100%** | **100%** | **100%** | **100%** |       3%      |  96%  |   **100%**  |      96%      |
@@ -102,7 +103,7 @@ Actor Critic Methods:
 - [**VMIX**: Value-Decomposition Multi-Agent Actor-Critics](https://arxiv.org/abs/2007.12306)
 - [**LICA**: Learning Implicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning](https://arxiv.org/abs/2007.02529)
 - [**DOP**: Off-Policy Multi-Agent Decomposed Policy Gradients](https://arxiv.org/abs/2007.12322)
-- [**RMC**: Revisiting the Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning](https://arxiv.org/abs/2102.03479)
+- [**RIIT**: Rethinking the Implementation Tricks and Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning.](https://arxiv.org/abs/2102.03479)
 
 ## Installation instructions
 
 
@@ -0,0 +1,49 @@
+# --- QMIX specific parameters with large networks ---
+# for 3s5z_vs_3s6z
+
+# use epsilon greedy action selector
+action_selector: "epsilon_greedy"
+epsilon_start: 1.0
+epsilon_finish: 0.05
+epsilon_anneal_time: 100000 
+
+runner: "parallel"
+batch_size_run: 8 
+buffer_size: 5000 
+batch_size: 128
+optimizer: 'adam'
+
+t_max: 10050000
+
+# update the target network every {} episodes
+target_update_interval: 200
+
+# use the Q_Learner to train
+mac: "n_mac"
+agent: "n_rnn"
+agent_output_type: q
+rnn_hidden_dim: 256
+
+learner: "nq_learner"
+mixer: "qmix"
+mixing_embed_dim: 64
+hypernet_embed: 256
+lr: 0.001 # Learning rate for agents
+td_lambda: 0.6
+optimizer: 'adam'
+weight_decay: 0
+
+# rnn layer normalization
+use_layer_norm: False
+
+# orthogonal init for DNN
+use_orthogonal: False
+gain: 0.01
+
+# Priority experience replay
+use_per: False
+per_alpha: 0.6
+per_beta: 0.4
+return_priority: False
+
+name: "qmix_large_env=8_adam_td_lambda"
@@ -39,4 +39,4 @@ entropy_coef: 0.03
 optimizer: 'adam'
 abs: True # monotonicity condition
 
-name: "rmc_env=8_adam_td_lambda"
+name: "riit_env=8_adam_td_lambda"