Reinforcement Learning Agent with Deep Q-Network (DQN)

Overview

This implementation provides a reinforcement learning agent using a Deep Q-Network (DQN) to navigate in a 2D environment. The agent learns to find the shortest path to a goal by iteratively exploring the environment and improving its policy.

Components

Agent Class

The main class that handles interactions with the environment. Key functionalities:

Initialization of Q-network, replay buffer, and learning parameters
Action selection using epsilon-greedy policy
Conversion between discrete and continuous action spaces
Storage of transitions in the replay buffer
Processing rewards and updating the Q-network
Epsilon decay over time using a cosine schedule

ReplayBuffer Class

Stores transitions (state, action, reward, next_state) for experience replay:

Uses a double-ended queue with a maximum size of 100,000 transitions
Allows sampling of transitions for training

Network Class

A neural network implementation using PyTorch:

Two hidden layers with 100 units each and ReLU activation
Input dimension of 2 (for 2D state space)
Output dimension of 4 (for the four possible discrete actions)

DQN Class

Handles the training of the Q-network:

Maintains both a primary Q-network and a target Q-network
Updates the target network periodically to stabilize learning
Implements loss calculation using the Bellman equation
Uses Adam optimizer for gradient updates

Action Space

The agent can take four discrete actions that are converted to continuous movements:

0: Move left (-0.02, 0)
1: Move right (0.02, 0)
2: Move up (0, 0.02)
3: Move down (0, -0.02)

Learning Process

The agent explores the environment using epsilon-greedy policy
Experiences are stored in the replay buffer
After collecting sufficient data, mini-batches are sampled for training
The Q-network is updated to minimize the temporal difference error
The target network is periodically updated to match the Q-network
Epsilon decreases over time according to a cosine decay schedule

Reward Structure

Rewards are based on the distance to the goal:

Base reward: 1 - distance_to_goal
Higher rewards for being closer to the goal
Scaled rewards for different distance thresholds

Hyperparameters

Episode length: 230 steps
Buffer size before training: 120 transitions
Mini-batch size: 100 transitions
Target network update frequency: Every 55 steps
Discount factor (gamma): 0.99
Learning rate: 0.005
Initial epsilon: 1.0

Usage

This agent is designed to be used in a compatible environment that provides:

A 2D state representation
Distance to goal measurement
The ability to apply continuous actions

The agent can be integrated into a simulation or training loop by calling its methods in the following order:

get_next_action(state) to get the next action to take
Apply the action in the environment to get next_state and distance_to_goal
set_next_state_and_distance(next_state, distance_to_goal) to update the agent
has_finished_episode() to check if the episode has ended

For evaluation, the get_greedy_action(state) method can be used to get the best action without exploration.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.DS_Store		.DS_Store
Graphs.png		Graphs.png
Prioritised_GreedyPolicy.png		Prioritised_GreedyPolicy.png
README.md		README.md
Section_2_Loss_Curve.png		Section_2_Loss_Curve.png
agent.py		agent.py
coursework_specification.pdf		coursework_specification.pdf
enviro.py		enviro.py
greedy_policy.png		greedy_policy.png
learning.png		learning.png
priority.py		priority.py
question2.py		question2.py
random_environment.py		random_environment.py
train.py		train.py
train_and_test.py		train_and_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reinforcement Learning Agent with Deep Q-Network (DQN)

Overview

Components

Agent Class

ReplayBuffer Class

Network Class

DQN Class

Action Space

Learning Process

Reward Structure

Hyperparameters

Usage

About

Uh oh!

Releases

Packages

Languages

SCBenson/ReinforcementLearning

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning Agent with Deep Q-Network (DQN)

Overview

Components

Agent Class

ReplayBuffer Class

Network Class

DQN Class

Action Space

Learning Process

Reward Structure

Hyperparameters

Usage

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages