This repository contains the official code for our paper. We present four methods which extend PB2 with meta-learning. These methods and relevant baselines were evaluated and compared on two families of Reinforcement Learning environments.
From the main directory execute the following commands to create a conda environment and install the required packages.
conda create --name meta-pb2 python=3.9
conda activate meta-pb2
pip install -r requirements.txt
Install singularity following the official instructions. Build a container from the provided definition file
sudo singularity build singularity.sif singularity.def
Run code with singularity
singularity exec --nv -H <path to the main directory> singularity.sif python <python file> <arguments>
The data containing the results of our experiments and the meta-data used by our methods can be downloaded from
After downloading the files, extract them in the root directory of this repository.
Note that
contains the meta-data used by our methods and
contains the results of our experiments.
The experiments can be reproduced by running the configuration files in the configurations
directory with the command below.
This command executes an example configuration that only takes a few minutes to train (2 min with Apple M3 Pro @ 4.06 GHz) and can be used to check the setup.
A single run from the experiments will approximately take an hour for classic control (Intel Xeon Gold 6242 @ 2.80GHz) and seven to eight hours for Brax (Intel Xeon E5-2630v4 @ 2.2 GHz).
python -m src.run_experiment --moab_id 0 --config_file_name pb2 --experiment_dir configurations/examples
The --moab_id
argument specifies which environment and seed will be chosen.
We also provided examples how to use our methods in your own setup in the examples
The methods introduced in this paper use two different types of meta-data. One type are portfolios which are used to initialize the hyperparameters. The other is in the form of previous runs which are used to update the hyperparameters.
Creating portfolios is done in three steps as described in Section 4.4 of our paper.
We created a small example configuration which only contains 2 environments with two gravity variants to demonstrate the process. The following commands can be used to generate the portfolios.
The first step is to generate portfolio candidates by finding optimal hyperparameter configurations for all environments by running:
for id in {0..3}
python -m src.generate_starting_regions --moab_id $id --config_file_name portfolio --experiment_dir configurations/examples --max_concurrent 4 --n_best_configs 1
The second step reruns the best configurations on each environment to later create a performance matrix.
for id in {0..15}
python -m src.rerun_starting_regions --moab_id 0 --config_file_name portfolio --initial_configs_save_dir ray_results/examples/portfolio/initial_configs --experiment_dir configurations/examples
The last step creates the portfolio by running the following command.
python -m src.generate_portfolio --initial_regions_save_dir ray_results/examples/portfolio/initial_configs --portfolio_size 2
Set the --meta_data
flag when running an experiment to save the results in a way that can be used by the meta-learning methods.
python -m src.run_experiment --moab_id 0 --config_file_name pb2 --experiment_dir configurations_classic_control --save_meta
The downloaded (see Data Download) or newly generated (see Creating Meta-Data) results can be interactively visualized in the visualizations.ipynb
If you use MetaPB2 in your work, please cite the following paper:
title={Meta-learning Population-based Methods for Reinforcement Learning},
author={Johannes Hog and Raghu Rajan and Andr{\'e} Biedenkapp and Noor Awad and Frank Hutter and Vu Nguyen},
journal={Transactions on Machine Learning Research},