Init

kwai · Jun 2, 2021 · 68080ed · 68080ed
1 parent bb5ae49
commit 68080ed
Show file tree

Hide file tree

Showing 30 changed files with 2,902 additions and 1 deletion.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,9 @@
+*.swp
+*.so
+__pycache__
+.DS_Store
+*.egg-info
+*.pyc
+douzero_checkpoints
+most_recent_model
+eval_data.pkl
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,202 @@
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
diff --git a/README.md b/README.md
@@ -1 +1,155 @@
-# DouZero
+# [ICML 2021] DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning
+<img width="500" src="./imgs/douzero_logo.jpg" alt="Logo" />
+
+DouZero is a reinforcement learning framework for  [DouDizhu](https://en.wikipedia.org/wiki/Dou_dizhu) ([斗地主](https://baike.baidu.com/item/%E6%96%97%E5%9C%B0%E4%B8%BB/177997)), the most popular card game in China. It is a shedding-type game where the player’s objective is to empty one’s hand of all cards before other players. DouDizhu is a very challenging domain with competition, collaboration, imperfect information, large state space, and particularly a massive set of possible actions where the legal actions vary significantly from turn to turn. DouZero is developed by AI Platform, Kwai Inc. (快手).
+
+*   Online Demo: [https://www.douzero.org/](https://www.douzero.org/)
+*   Run the Demo Locally: [https://github.com/datamllab/rlcard-showdown](https://github.com/datamllab/rlcard-showdown)
+*   Paper: 
+*   Related Project: [https://github.com/datamllab/rlcard](https://github.com/datamllab/rlcard)
+
+**Community:**
+*  **Slack**: Discuss in [DouZero](https://join.slack.com/t/douzero/shared_invite/zt-rg3rygcw-ouxxDk5o4O0bPZ23vpdwxA) channel.
+*  **QQ Group**:
+
+## Cite this Work
+
+
+## What Makes DouDizhu Challenging?
+In addition to the challenge of imperfect information, DouDizhu has huge state and action spaces. In particular, the action space of DouDizhu is 10^4 (see [this table](https://github.com/datamllab/rlcard#available-environments)). Unfortunately, most reinforcement learning algorithms can only handle very small action spaces. Moreover, the players in DouDizhu need to both compete and cooperate with others in a partially-observable environment with limited communication, i.e., two Peasants players will play as a team to fight against the Landlord player. Modeling both competing and cooperation is an open research challenge.
+
+In this work, we propose Deep Monte Carlo (DMC) algorithm with action encoding and parallel actors. This leads to a very simple yet surprisingly effective solution for DouDizhu. Please read our paper for more details.
+
+## Installation
+Clone the repo with
+```
+https://github.com/daochenzha/douzero.git
+```
+Make sure you have python 3.5+ installed. Install dependencies:
+```
+cd douzero
+pip3 install -r requirements.txt
+```
+We recommend installing the stable version of DouZero with
+```
+pip3 install douzero
+```
+or install the up-to-date version (it could be not stable) with
+```
+pip3 install -e .
+```
+
+## Training
+We assume you have at least one GPU available. Run
+```
+python3 train.py
+```
+This will train DouZero on one GPU. To train DouZero on multiple GPUs. Use the following arguments.
+*   `--gpu_devices`: what gpu devices are visible
+*   `--num_actors_devices`: how many of the GPU deveices will be used for simulation, i.e., self-play
+*   `--num_actors`: how many actor processes will be used for each device
+*   `--training_device`: which device will be used for training DouZero
+
+For example, if we have 4 GPUs, where we want to use the first 3 GPUs to have 15 actors each for simulating and the 4th GPU for training, we can run the following command:
+```
+python3 train.py --gpu_devices 0,1,2,3 --num_actors_devices 3 --num_actors 15 --training_device 3
+```
+For more customized configuration of training, see the following optional arguments:
+```
+--xpid XPID           Experiment id (default: douzero)
+--save_interval SAVE_INTERVAL
+                      Time interval (in minutes) at which to save the model
+--objective {adp,wp}  Use ADP or WP as reward (default: ADP)
+--gpu_devices GPU_DEVICES
+                      Which GPUs to be used for training
+--num_actor_devices NUM_ACTOR_DEVICES
+                      The number of devices used for simulation
+--num_actors NUM_ACTORS
+                      The number of actors for each simulation device
+--training_device TRAINING_DEVICE
+                      The index of the GPU used for training models
+--load_model          Load an existing model
+--disable_checkpoint  Disable saving checkpoint
+--savedir SAVEDIR     Root dir where experiment data will be saved
+--total_frames TOTAL_FRAMES
+                      Total environment frames to train for
+--exp_epsilon EXP_EPSILON
+                      The probability for exploration
+--batch_size BATCH_SIZE
+                      Learner batch size
+--unroll_length UNROLL_LENGTH
+                      The unroll length (time dimension)
+--num_buffers NUM_BUFFERS
+                      Number of shared-memory buffers
+--num_threads NUM_THREADS
+                      Number learner threads
+--max_grad_norm MAX_GRAD_NORM
+                      Max norm of gradients
+--learning_rate LEARNING_RATE
+                      Learning rate
+--alpha ALPHA         RMSProp smoothing constant
+--momentum MOMENTUM   RMSProp momentum
+--epsilon EPSILON     RMSProp epsilon
+```
+
+## Evaluation
+The evaluation can be performed with GPU or CPU (GPU will be much faster). Pretrained model is available at [Google Drive](https://drive.google.com/drive/folders/1NmM2cXnI5CIWHaLJeoDZMiwt6lOTV_UB?usp=sharing) or [百度网盘](https://pan.baidu.com/s/18g-JUKad6D8rmBONXUDuOQ), 提取码: 4624. Put pre-trained weights in `baselines/`. The performance is evaluated through self-play. We have provided pre-trained models and some heuristics as baselines:
+*   [random](douzero/evaluation/random_agent.py): agents that play randomly (uniformly)
+*   [rlcard](douzero/evaluation/rlcard/agent.py): the rule-based agent in [RLCard](https://github.com/datamllab/rlcard)
+*   SL (`baselines/sl/`): the pre-trained deep agents on human data
+*   DouZero-ADP (`baselines/douzero_ADP/`): the pretrained DouZero agents with Average Difference Points (ADP) as objective
+*   DouZero-WP (`baselines/douzero_WP/`): the pretrained DouZero agents with Winning Percentage (WP) as objective
+
+### Step 1: Generate evaluation data
+```
+python3 generate_eval_data.py
+```
+Some important hyperparameters are as follows.
+*   `--output`: where the pickled data will be saved
+*   `--num_games`: how many random games will be generated, default 10000
+
+### Step 2: Self-Play
+```
+python3 evaluate.py
+```
+Some important hyperparameters are as follows.
+*   `--landlord`: which agent will play as Landlord, which can be random, rlcard, or the path of the pre-trained model
+*   `--landlord_up`: which agent will play as LandlordUp (the one plays before the Landlord), which can be random, rlcard, or the path of the pre-trained model
+*   `--landlord_down`: which agent will play as LandlordDown (the one plays after the Landlord), which can be random, rlcard, or the path of the pre-trained model
+*   `--eval_data`: the pickle file that contains evaluation data
+
+For example, the following command evaluates DouZero-ADP in Landlord position against random agents
+```
+python3 evaluate.py --landlord baselines/douzero_ADP/landlord.ckpt --landlord_up random --landlord_down random
+```
+The following command evaluates DouZero-ADP in Peasants position against RLCard agents
+```
+python3 evaluate.py --landlord rlcard --landlord_up baselines/douzero_ADP/landlord_up.ckpt --landlord_down baselines/douzero_ADP/landlord_down.ckpt
+```
+By default, our model will be saved in `douzero_checkpoints/douzero` every half an hour. We provide a script to help you identify the most recent checkpoint. Run
+```
+sh get_most_recent.sh douzero_checkpoints/douzero/
+```
+The most recent model will be in `most_recent_model`.
+
+## Core Team
+*   Algorithm: [Daochen Zha](https://github.com/daochenzha), [Jingru Xie](https://github.com/karoka), Wenye Ma, Sheng Zhang, Xiangru Lian, Xia Hu, Ji Liu
+*   GUI Demo: [Songyi Huang](https://github.com/hsywhu)
+
+## Acknowlegements
+*   The demo is largely based on [RLCard-Showdown](https://github.com/datamllab/rlcard-showdown)
+*   Code implementation is inspired by [TorchBeast](https://github.com/facebookresearch/torchbeast)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/baselines/put_pretrained_models_here b/baselines/put_pretrained_models_here
diff --git a/douzero/__init__.py b/douzero/__init__.py
diff --git a/douzero/dmc/__init__.py b/douzero/dmc/__init__.py
@@ -0,0 +1,2 @@
+from .dmc import train
+from .arguments import parser
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		from .dmc import train
		from .arguments import parser