Skip to content

This repository contains code to train and test policies for a MPE environment (Simple Spread). Training is done using DQL for independent learning. Testing was done using 3 different policies: RL, Simple Policy, Complex Policy.

Notifications You must be signed in to change notification settings

jkkrupinski/AASMA-2024

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AASMA group project 2024

Enviroment (Simple Spread)

This environment has N agents, N landmarks (default N=3). At a high level, agents must learn to cover all the landmarks while avoiding collisions.

More specifically, all agents are globally rewarded based on how far the closest agent is to each landmark (sum of the minimum distances). Locally, the agents are penalized if they collide with other agents (-1 for each collision). The relative weights of these rewards can be controlled with the local_ratio parameter.

Agent observations: [self_vel, self_pos, landmark_rel_positions, other_agent_rel_positions, communication]

Agent action space: [no_action, move_left, move_right, move_down, move_up]

Below You can see example of random agent behaviour in the environment:

mpe

Training

To start training use train.py script. Adjust hyperparameters as well as num of episodes to learn.

Down below You can see used DQNs achitecture for the reinforcement task:

arch

Testing

To test trained model use test.py script, with correct arguments: {rl,sp,cp} {rl,sp,cp} {rl,sp,cp} num_of_runs [seeds].

Where first 3 values correspond to policies of 3 agents:

  • rl - reinforcement learning policy
  • sp- simple policy
  • cp - complex policy

Than You should pass number of runs you want to test on.

Lastly You can choose seeds for your runs or You can leave it empty for random seeds. Number of seeds must correspond to number of runs.

Make sure models are named correctly and in the right folder. Repository contains pre-trained models to see possible results named agent_X.pt, X is the agent number.

Policies

  • Simple policy

    Agents choose and go to the closest landmark along X and Y axis. Agents choose longer axis first.

    mpe
  • Complex policy

    Agents are coordinating where to go based on closest distance from agent to landmark.

    mpe
  • Independent DQL

    Agents were trained using Deep Q-Learning (DQL) with 3 sets of Deep Q-Network (DQN), one set for each agent.

    mpe

Results

Here are best results achived using DQN for 7 different runs:

mpe

Requirements

  • Pettingzoo environment - pip install 'pettingzoo[mpe]' == 1.24.3
  • PyTorch - pip install torch == 2.2.2
  • NumPy - pip install numpy
  • SciPy - pip install scipy

About

This repository contains code to train and test policies for a MPE environment (Simple Spread). Training is done using DQL for independent learning. Testing was done using 3 different policies: RL, Simple Policy, Complex Policy.

Topics

Resources

Stars

Watchers

Forks

Languages