Reinforcement Learning with Pretraining for Pricing and Repositioning in Ride-Hailing Networks

This repository contains the code and data used to reproduce the numerical experiments from the paper: "Reinforcement Learning with Pretraining for Pricing and Driver Repositioning in Ride-Hailing Networks"

Overview

This project implements a reinforcement learning framework for jointly optimizing pricing and vehicle repositioning decisions in ride-hailing networks. The approach combines:

A Markov decision process (MDP) formulation.
Two-timescale actor-critic methods based on Proximal Policy Optimization (PPO).
Pretraining using expert policies derived from lookahead strategies.

Data

The raw data used in Section 5 is publicly available from the NYC Taxi & Limousine Commission: https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page We use the following datasets:

February 2019 – High Volume For-Hire Vehicle Trip Records
March 2019 – High Volume For-Hire Vehicle Trip Records
The scripts in the data folder preprocess these datasets to estimate model parameters such as arrival rates, travel times, and distances.

Dependencies

The project mainly relies on the following packages:

Python 3.10+
NumPy
PyTorch
Gymnasium
Stable-Baselines3

Core modules

These folders contain the main components of the methodology:

data: Scripts to preprocess raw data and generate model parameters (arrival rates, distances, travel times), stored as NumPy files.
env_1timescale: Implementation of the singe-timescale MDP formulation using Gymnasium.
envs_2timescale: Implementation of the two-timescale MDP formulation using Gymnasium
ppo_2timescale: Implementation of the two-timescale actor-critic framework along with a customized PPO algorithm, building on Stable-Baselines3.
pretraining_2timescale: Implementation of actor and critic pretraining procedures tailored to the proposed two-timescale actor-critic framework.
pretraining_1timescale: Implementation of actor and critic pretraining procedures tailored to the proposed single-timescale actor-critic framework. Note that the single-timescale PPO implementation is already developped by Stable-Baselines3

Additional modules

These folders contain scripts used to generate the results presented in the paper:

PPO_no_pretraining: Training and evaluation of PPO agents without pretraining.
PPO_pretraining: Pretraining of actor-critic networks followed by PPO training and evaluation.
LH_strategies: Simulation of lookahead strategies used to generate expert state-decision datasets.
Experiments: Scripts and output data used to generate figures and tables in Section 5. Subfolders correspond to the specific subsections of the paper.

Installation

Clone the repository:

git clone https://github.com/ThomasDeMunck12/RL_wPT_RHNetworks/tree/main
cd RL_wPT_RHNetworks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning with Pretraining for Pricing and Repositioning in Ride-Hailing Networks

Overview

Data

Dependencies

Core modules

Additional modules

Installation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 156 Commits
Experiments		Experiments
LH_strategies		LH_strategies
PPO_no_pretraining		PPO_no_pretraining
PPO_pretraining		PPO_pretraining
data		data
env_1timescale		env_1timescale
envs_2timescale		envs_2timescale
ppo_2timescale		ppo_2timescale
pretraining_1timescale		pretraining_1timescale
pretraining_2timescale		pretraining_2timescale
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning with Pretraining for Pricing and Repositioning in Ride-Hailing Networks

Overview

Data

Dependencies

Core modules

Additional modules

Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages