Skip to content

Latest commit

 

History

History
55 lines (55 loc) · 2.12 KB

2023-04-11-bourel23a.md

File metadata and controls

55 lines (55 loc) · 2.12 KB
title software abstract section layout series publisher issn id month tex_title firstpage lastpage page order cycles bibtex_author author date address container-title volume genre issued pdf extras
Exploration in Reward Machines with Low Regret
We study reinforcement learning (RL) for decision processes with non-Markovian reward, in which high-level knowledge in the form of reward machines is available to the learner. Specifically, we investigate the efficiency of RL under the average-reward criterion, in the regret minimization setting. We propose two model-based RL algorithms that each exploits the structure of the reward machines, and show that our algorithms achieve regret bounds that improve over those of baselines by a multiplicative factor proportional to the number of states in the underlying reward machine. To the best of our knowledge, the proposed algorithms and associated regret bounds are the first to tailor the analysis specifically to reward machines, either in the episodic or average-reward settings. We also present a regret lower bound for the studied setting, which indicates that the proposed algorithms achieve a near-optimal regret. Finally, we report numerical experiments that demonstrate the superiority of the proposed algorithms over existing baselines in practice.
Regular Papers
inproceedings
Proceedings of Machine Learning Research
PMLR
2640-3498
bourel23a
0
Exploration in Reward Machines with Low Regret
4114
4146
4114-4146
4114
false
Bourel, Hippolyte and Jonsson, Anders and Maillard, Odalric-Ambrym and Talebi, Mohammad Sadegh
given family
Hippolyte
Bourel
given family
Anders
Jonsson
given family
Odalric-Ambrym
Maillard
given family
Mohammad Sadegh
Talebi
2023-04-11
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics
206
inproceedings
date-parts
2023
4
11