Skip to content

AddieFoote/rl-final-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Goal-Conditioned Reinforcement Learning and Representations

Report Python 3.12

Official implementation of "Goal-Conditioned Reinforcement Learning and Representations" - an empirical study of goal-conditioned RL algorithms across different observation spaces and architectural configurations in procedurally generated environments. For detailed results and analysis, see our technical report and presentation.

Abstract

Abstract—Goal-Conditioned Reinforcement Learning enables agents to learn policies that can accomplish a variety of goals. We investigate goal-conditioned Reinforcement learning in a custom grid world environment based on the BabyAI framework that enables us to specify the goal as a desired world state. We compare the performance of different RL algorithms, including PPO, A2C, and DQN. We also explore the impact of various state and goal representations along with network architectures for our function extractors. Our experiments show that PPO outperforms the other algorithms in our setup, and that concatenating fully observable state representations with goal states is an effective input representation for the network. To address challenges with sparse rewards in larger environments, we implement reward shaping based on the distance between the ball and the goal, which enables learning in 6x6 grid worlds. We also test hindsight experience replay, but find that it does not yield significant benefits and substantially underperforms PPO in our specific setup. Our findings demonstrate the potential of goal-conditioned RL for flexibly solving tasks with multiple goals and highlight the importance of appropriate state and goal representations.

Installation

# Clone repository
git clone https://github.com/your-username/goal-conditioned-rl.git
cd goal-conditioned-rl

# Create environment
conda create -n gcrl python=3.12
conda activate gcrl

# Install dependencies
pip install -r requirements.txt

Quick Start

Train a PPO agent on our custom dynamic environment:

python testBabyaiEnv.py \
    --env custom-dynamic \
    --obs fully-observable \
    --num-conv-layers 5 \
    --num_env 32 \
    --num-timesteps 4000000 \
    --size 5 \
    --reward-shaping True

Experimental Configuration

Core Parameters

Parameter Options Description
--env custom-set-goal, custom-dynamic, room Environment variant
--obs one-hot, img, fully-observable, fully-observable-one-hot Observation representation
--algorithm PPO, A2C, DQN, HER RL algorithm
--policy CnnPolicy, MlpPolicy Policy network architecture
--num-timesteps Integer Total training timesteps
--num-conv-layers 3, 5, 8 CNN feature extractor depth
--num_envs Integer Number of parallel environments
--size Integer Environment grid size (square)
--goal-features fully-observable, one-pos, same-network-fully-obs, HER Goal representation method
--reward-shaping True/False Enable dense reward signals (default: False)

We evaluate CNN architectures with 3, 5, and 8 convolutional layers to understand the impact of representation capacity on goal-conditioned learning efficiency.

Algorithm-Specific Notes

  • HER (Hindsight Experience Replay): Only compatible with custom-dynamic environment
  • Reward Shaping: Optional dense reward signal for accelerated learning

Monitoring and Visualization

Generate clean experiment logs:

make board env=target_env

Launch TensorBoard for result visualization:

tensorboard --logdir ./logs/clean/target_env

About

An empirical study of goal-conditioned RL

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •