Goal-Conditioned Reinforcement Learning and Representations

Official implementation of "Goal-Conditioned Reinforcement Learning and Representations" - an empirical study of goal-conditioned RL algorithms across different observation spaces and architectural configurations in procedurally generated environments. For detailed results and analysis, see our technical report and presentation.

Abstract

Abstract—Goal-Conditioned Reinforcement Learning enables agents to learn policies that can accomplish a variety of goals. We investigate goal-conditioned Reinforcement learning in a custom grid world environment based on the BabyAI framework that enables us to specify the goal as a desired world state. We compare the performance of different RL algorithms, including PPO, A2C, and DQN. We also explore the impact of various state and goal representations along with network architectures for our function extractors. Our experiments show that PPO outperforms the other algorithms in our setup, and that concatenating fully observable state representations with goal states is an effective input representation for the network. To address challenges with sparse rewards in larger environments, we implement reward shaping based on the distance between the ball and the goal, which enables learning in 6x6 grid worlds. We also test hindsight experience replay, but find that it does not yield significant benefits and substantially underperforms PPO in our specific setup. Our findings demonstrate the potential of goal-conditioned RL for flexibly solving tasks with multiple goals and highlight the importance of appropriate state and goal representations.

Installation

# Clone repository
git clone https://github.com/your-username/goal-conditioned-rl.git
cd goal-conditioned-rl

# Create environment
conda create -n gcrl python=3.12
conda activate gcrl

# Install dependencies
pip install -r requirements.txt

Quick Start

Train a PPO agent on our custom dynamic environment:

python testBabyaiEnv.py \
    --env custom-dynamic \
    --obs fully-observable \
    --num-conv-layers 5 \
    --num_env 32 \
    --num-timesteps 4000000 \
    --size 5 \
    --reward-shaping True

Experimental Configuration

Core Parameters

Parameter	Options	Description
`--env`	`custom-set-goal`, `custom-dynamic`, `room`	Environment variant
`--obs`	`one-hot`, `img`, `fully-observable`, `fully-observable-one-hot`	Observation representation
`--algorithm`	`PPO`, `A2C`, `DQN`, `HER`	RL algorithm
`--policy`	`CnnPolicy`, `MlpPolicy`	Policy network architecture
`--num-timesteps`	Integer	Total training timesteps
`--num-conv-layers`	`3`, `5`, `8`	CNN feature extractor depth
`--num_envs`	Integer	Number of parallel environments
`--size`	Integer	Environment grid size (square)
`--goal-features`	`fully-observable`, `one-pos`, `same-network-fully-obs`, `HER`	Goal representation method
`--reward-shaping`	`True`/`False`	Enable dense reward signals (default: False)

We evaluate CNN architectures with 3, 5, and 8 convolutional layers to understand the impact of representation capacity on goal-conditioned learning efficiency.

Algorithm-Specific Notes

HER (Hindsight Experience Replay): Only compatible with custom-dynamic environment
Reward Shaping: Optional dense reward signal for accelerated learning

Monitoring and Visualization

Generate clean experiment logs:

make board env=target_env

Launch TensorBoard for result visualization:

tensorboard --logdir ./logs/clean/target_env

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
copied_hre		copied_hre
hindsight-experience-replay @ 65468d5		hindsight-experience-replay @ 65468d5
logs		logs
.gitignore		.gitignore
.gitmodules		.gitmodules
Makefile		Makefile
README.md		README.md
copied_hre_algo.py		copied_hre_algo.py
feature_exctractors.py		feature_exctractors.py
full_observable.py		full_observable.py
goal_conditioned_wrappers.py		goal_conditioned_wrappers.py
goal_env.py		goal_env.py
our_manual_control.py		our_manual_control.py
reorganize_logs.py		reorganize_logs.py
requirements.txt		requirements.txt
run_diff_algos.sh		run_diff_algos.sh
run_diff_cnn_size.sh		run_diff_cnn_size.sh
run_diff_representations.sh		run_diff_representations.sh
run_exp.sh		run_exp.sh
run_pure_env.sh		run_pure_env.sh
run_with_shaping_env.sh		run_with_shaping_env.sh
testBabyaiEnv.py		testBabyaiEnv.py
true_output.txt		true_output.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Goal-Conditioned Reinforcement Learning and Representations

Abstract

Installation

Quick Start

Experimental Configuration

Core Parameters

Algorithm-Specific Notes

Monitoring and Visualization

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

AddieFoote/rl-final-project

Folders and files

Latest commit

History

Repository files navigation

Goal-Conditioned Reinforcement Learning and Representations

Abstract

Installation

Quick Start

Experimental Configuration

Core Parameters

Algorithm-Specific Notes

Monitoring and Visualization

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages