title | software | abstract | section | layout | series | publisher | issn | id | month | tex_title | firstpage | lastpage | page | order | cycles | bibtex_author | author | date | address | container-title | volume | genre | issued | extras | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TS-UCB: Improving on Thompson Sampling With Little to No Additional Computation |
Thompson sampling has become a ubiquitous approach to online decision problems with bandit feedback. The key algorithmic task for Thompson sampling is drawing a sample from the posterior of the optimal action. We propose an alternative arm selection rule we dub TS-UCB, that requires negligible additional computational effort but provides significant performance improvements relative to Thompson sampling. At each step, TS-UCB computes a score for each arm using two ingredients: posterior sample(s) and upper confidence bounds. TS-UCB can be used in any setting where these two quantities are available, and it is flexible in the number of posterior samples it takes as input. TS-UCB achieves materially lower regret on a comprehensive suite of synthetic and real-world datasets, including a personalized article recommendation dataset from Yahoo! and a suite of benchmark datasets from a deep bandit suite proposed in Riquelme et al. (2018). Finally, from a theoretical perspective, we establish optimal regret guarantees for TS-UCB for both the K-armed and linear bandit models. |
Regular Papers |
inproceedings |
Proceedings of Machine Learning Research |
PMLR |
2640-3498 |
baek23a |
0 |
TS-UCB: Improving on Thompson Sampling With Little to No Additional Computation |
11132 |
11148 |
11132-11148 |
11132 |
false |
Baek, Jackie and Farias, Vivek |
|
2023-04-11 |
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics |
206 |
inproceedings |
|