-
Notifications
You must be signed in to change notification settings - Fork 2
Home
Battle Master is an AI agent created to play Pokemon battles (particularly by connecting to an instance of Pokemon Shodown). It uses the CLARION cognitive architecture as opposed to any particular AI paradigm.
In creator Ron Sun's words from the Oxford Handbook of Cognitive Science, CLARION is
a hybrid cognitive architecture... that is significantly different from most existing cognitive architectures in several important respects. For one thing, the CLARION cognitive architecture is hybrid in that it (a) combines connectionist and symbolic representations computationally, (b) combines implicit and explicit psychological processes, and (c) combines cognition (in the narrow sense) and other psychological processes (such as motivation and emotion).
CLARION operates on the assumption that all memory has a dual representation - an explicit representation and an implicit representation. To oversimplify, explicit memory is symbolic in nature, and is easily accessible while implicit memory is not readily accessible (e.g., an embedding). There are explicit and implicit processes that operate on each type of memory.
CLARION is composed of 4 major subsystems, each composed of an top-level submodule (for explicit memory/processes) and a bottom-level submodule. The 4 major subsystems provide the following functionality:
- Action-Centered Subsystem (ACS) - Controls actions based on procedural knowledge
- Non-Action-Centered Subsystem (NACS) - Deals with declarative knowledge.
- Can be further broken down into semantic and episodic modules.
- Motivational Subsystem (MS) - Addresses cognition-motivation interaction
- Simply maximizing "rewards" begs the question of where do rewards come from, and this subsystem provides that context.
- implicit motivations are called "drives" - e.g., hunger
- explicit motivations are called "goals" - e.g., eat
- Meta-cognitive Subsystem (MCS) - Provides meta-cognitive control and regulation of the MS and other subsystems.
Image credit: Oxford Handbook of Cognitive Science
For those interested in a deep dive into CLARION theory, I'd leave it to you to access the Oxford Handbook of Cognitive Science or Dr. Ron Sun's various publications.
Battle Master uses an experimental implementation of CLARION called pyClarion, created by one of Dr. Sun's PhD students. Unfortunately, the library is quite limited in terms of flow control, enforcing a linear execution through the major subsystems. Below is the general flow of control of Battle Master:
- The agent perceives the current state of the battle
- Implicit drives activate with various strengths given the perception
- e.g., if the agent's Pokemon has low health, the agent will have a higher drive to switch out the weak Pokemon to preserve it.
- The drive activates in turn activate explicit goals
- The MCS chooses a goal
- The MCS decides the level of effort the agent needs to expend on thinking depending on performance
- e.g., if the agent is losing the battle, it'll try to "think harder"
- The NACS will employ different processes depending on the effort dictated by the MCS
- If the agent needs to think hard, it'll try to simulate a couple turns into the future and pick the best action that forward the goal (uses expectiminimax).
- If the agent doesn't need to think hard, it'll consider all actions that forward the goal and weighs them according to utility (using it's knowledge of the physics of the game)
- If multiple actions were identified to forward the goal, it'll sample one. If no actions were identified to forward the goal, it'll sample among all useful actions.
- A "useful" action is one that has some Pokemon typing efficacy. For example, choosing a move with a 1x damage multiplier or switching to a Pokemon with a type advantage.
- The ACS chooses an action among the actions identified by the NACS
- If the NACS was unable to identify an action, the ACS chooses a random action
All-in-all, the agent can be thought of as performing simple steering behavior where the agent's motivation and metacognition are in control of the steering. The agent is strictly reflexive, but given more time additional processes could be added to allow the agent to be more than reflexive.
The goal of the project was not to create the best agent, but rather to experiment with applying a cognitive architecture to a turn-based video game. Regardless, Battle Master was bench marked by playing 1000 battles each against 4 other agents:
- Random - always selects a random action
- Max Damage - always selects that move that will inflict the most damage to the current opponent Pokemon
- Never switches and does not pick non-damaging moves
- Simple Heuristic - Selects moves or switches based on some simple heuristics
- Also selects from among non-damaging moves. e.g., self buffs
- Expectiminimax - An agent that simulates and looks through a game tree 2 layers deep
- The branching factor of Pokemon is very large, which prohibits deep searches
Agent | Win Rate |
---|---|
Random | 97.2% |
Max Damage | 38.0% |
Heuristic | 28.9% |
Exp.MinMax | 27.6% |
It is worth noting that the Expectiminimax agent was used to climb to 1700+ elo on the public Pokemon Showdown ladder (top ~10% of players). Though it likely used a greater depth than 2.
While the pyClarion library was highly experimental, I believe the performance of Battle Master, relative to its simplicity and limitations, demonstrates that cognitive architecture are a promising avenue for implementing out-of-the-loop video game agents (agents that don't run inside the game update cycle). There were many ideas left on the cutting room floor for this project. One example is synergistic learning using case-based reasoning plus expectiminimax, where nodes from the simulated game tree could become cases in memory, or generalized game trees could be made to be cases. Additionally, metacognition could be used to tune goal activations from drive activations.
I believe the most powerful potential of cognitive architectures in this space is the ability for the architecture to acts as a management system among many types of processes that may require different representations - symbolic or subsymbolic. For example, a cognitive architecture could accommodate planning, case-based reasoning, state machines, steering behavior, neural networks, NLP, any many more techniques all in one place and establish synergy among them.
- My wife for supporting me so well through the OMSCS program
- Dr. Jeff Wilson for advising this project
- Dr. Ron Sun for CLARION theory
- Dr. Can Mekik for authoring pyClarion
- Haris Sahovic for implementing poke-env, the library used to interact with Pokemon Showdown and facilitate benchmarking battles
- pmariglia for implementing the expectiminimax agent and a battle simulator
- pokeaimMD for being my domain expert during the project