Home

Battle Master is an AI agent created to play Pokemon battles (particularly by connecting to an instance of Pokemon Shodown). It uses the CLARION cognitive architecture as opposed to any particular AI paradigm.

In creator Ron Sun's words from the Oxford Handbook of Cognitive Science, CLARION is

a hybrid cognitive architecture... that is significantly different from most existing cognitive architectures in several important respects. For one thing, the CLARION cognitive architecture is hybrid in that it (a) combines connectionist and symbolic representations computationally, (b) combines implicit and explicit psychological processes, and (c) combines cognition (in the narrow sense) and other psychological processes (such as motivation and emotion).

CLARION Overview

CLARION operates on the assumption that all memory has a dual representation - an explicit representation and an implicit representation. To oversimplify, explicit memory is symbolic in nature, and is easily accessible while implicit memory is not readily accessible (e.g., an embedding). There are explicit and implicit processes that operate on each type of memory.

CLARION is composed of 4 major subsystems, each composed of an top-level submodule (for explicit memory/processes) and a bottom-level submodule. The 4 major subsystems provide the following functionality:

Action-Centered Subsystem (ACS) - Controls actions based on procedural knowledge
Non-Action-Centered Subsystem (NACS) - Deals with declarative knowledge.
- Can be further broken down into semantic and episodic modules.
Motivational Subsystem (MS) - Addresses cognition-motivation interaction
- Simply maximizing "rewards" begs the question of where do rewards come from, and this subsystem provides that context.
- implicit motivations are called "drives" - e.g., hunger
- explicit motivations are called "goals" - e.g., eat
Meta-cognitive Subsystem (MCS) - Provides meta-cognitive control and regulation of the MS and other subsystems.

Image credit: Oxford Handbook of Cognitive Science

For those interested in a deep dive into CLARION theory, I'd leave it to you to access the Oxford Handbook of Cognitive Science or Dr. Ron Sun's various publications.

How Battle Master Works

Battle Master uses an experimental implementation of CLARION called pyClarion, created by one of Dr. Sun's PhD students. Unfortunately, the library is quite limited in terms of flow control, enforcing a linear execution through the major subsystems. Below is the general flow of control of Battle Master:

The agent perceives the current state of the battle
Implicit drives activate with various strengths given the perception
- e.g., if the agent's Pokemon has low health, the agent will have a higher drive to switch out the weak Pokemon to preserve it.
The drive activates in turn activate explicit goals
The MCS chooses a goal
The MCS decides the level of effort the agent needs to expend on thinking depending on performance
- e.g., if the agent is losing the battle, it'll try to "think harder"
The NACS will employ different processes depending on the effort dictated by the MCS
- If the agent needs to think hard, it'll try to simulate a couple turns into the future and pick the best action that forward the goal (uses expectiminimax).
- If the agent doesn't need to think hard, it'll consider all actions that forward the goal and weighs them according to utility (using it's knowledge of the physics of the game)
If multiple actions were identified to forward the goal, it'll sample one. If no actions were identified to forward the goal, it'll sample among all useful actions.
- A "useful" action is one that has some Pokemon typing efficacy. For example, choosing a move with a 1x damage multiplier or switching to a Pokemon with a type advantage.
The ACS chooses an action among the actions identified by the NACS
- If the NACS was unable to identify an action, the ACS chooses a random action

All-in-all, the agent can be thought of as performing simple steering behavior where the agent's motivation and metacognition are in control of the steering. The agent is strictly reflexive, but given more time additional processes could be added to allow the agent to be more than reflexive.

Performance

The goal of the project was not to create the best agent, but rather to experiment with applying a cognitive architecture to a turn-based video game. Regardless, Battle Master was bench marked by playing 1000 battles each against 4 other agents:

Random - always selects a random action
Max Damage - always selects that move that will inflict the most damage to the current opponent Pokemon
- Never switches and does not pick non-damaging moves
Simple Heuristic - Selects moves or switches based on some simple heuristics
- Also selects from among non-damaging moves. e.g., self buffs
Expectiminimax - An agent that simulates and looks through a game tree 2 layers deep
- The branching factor of Pokemon is very large, which prohibits deep searches

Agent	Win Rate
Random	97.2%
Max Damage	38.0%
Heuristic	28.9%
Exp.MinMax	27.6%

It is worth noting that the Expectiminimax agent was used to climb to 1700+ elo on the public Pokemon Showdown ladder (top ~10% of players). Though it likely used a greater depth than 2.

Conclusion

While the pyClarion library was highly experimental, I believe the performance of Battle Master, relative to its simplicity and limitations, demonstrates that cognitive architecture are a promising avenue for implementing out-of-the-loop video game agents (agents that don't run inside the game update cycle). There were many ideas left on the cutting room floor for this project. One example is synergistic learning using case-based reasoning plus expectiminimax, where nodes from the simulated game tree could become cases in memory, or generalized game trees could be made to be cases. Additionally, metacognition could be used to tune goal activations from drive activations.

I believe the most powerful potential of cognitive architectures in this space is the ability for the architecture to acts as a management system among many types of processes that may require different representations - symbolic or subsymbolic. For example, a cognitive architecture could accommodate planning, case-based reasoning, state machines, steering behavior, neural networks, NLP, any many more techniques all in one place and establish synergy among them.

Acknowledgements

My wife for supporting me so well through the OMSCS program
Dr. Jeff Wilson for advising this project
Dr. Ron Sun for CLARION theory
Dr. Can Mekik for authoring pyClarion
Haris Sahovic for implementing poke-env, the library used to interact with Pokemon Showdown and facilitate benchmarking battles
pmariglia for implementing the expectiminimax agent and a battle simulator
pokeaimMD for being my domain expert during the project

Provide feedback

Saved searches