Skip to content

Latest commit

 

History

History
50 lines (50 loc) · 2.05 KB

2023-04-11-baek23a.md

File metadata and controls

50 lines (50 loc) · 2.05 KB
title software abstract section layout series publisher issn id month tex_title firstpage lastpage page order cycles bibtex_author author date address container-title volume genre issued pdf extras
TS-UCB: Improving on Thompson Sampling With Little to No Additional Computation
Thompson sampling has become a ubiquitous approach to online decision problems with bandit feedback. The key algorithmic task for Thompson sampling is drawing a sample from the posterior of the optimal action. We propose an alternative arm selection rule we dub TS-UCB, that requires negligible additional computational effort but provides significant performance improvements relative to Thompson sampling. At each step, TS-UCB computes a score for each arm using two ingredients: posterior sample(s) and upper confidence bounds. TS-UCB can be used in any setting where these two quantities are available, and it is flexible in the number of posterior samples it takes as input. TS-UCB achieves materially lower regret on a comprehensive suite of synthetic and real-world datasets, including a personalized article recommendation dataset from Yahoo! and a suite of benchmark datasets from a deep bandit suite proposed in Riquelme et al. (2018). Finally, from a theoretical perspective, we establish optimal regret guarantees for TS-UCB for both the K-armed and linear bandit models.
Regular Papers
inproceedings
Proceedings of Machine Learning Research
PMLR
2640-3498
baek23a
0
TS-UCB: Improving on Thompson Sampling With Little to No Additional Computation
11132
11148
11132-11148
11132
false
Baek, Jackie and Farias, Vivek
given family
Jackie
Baek
given family
Vivek
Farias
2023-04-11
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics
206
inproceedings
date-parts
2023
4
11