title | abstract | section | layout | series | publisher | issn | id | month | tex_title | firstpage | lastpage | page | order | cycles | bibtex_author | author | date | address | container-title | volume | genre | issued | extras | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Piecewise Stationary Bandits under Risk Criteria |
Piecewise stationary stochastic multi-armed bandits have been extensively explored in the risk-neutral and sub-Gaussian setting. In this work, we consider a multi-armed bandit framework in which the reward distributions are heavy-tailed and non-stationary, and evaluate the performance of algorithms using general risk criteria. Specifically, we make the following contributions: (i) We first propose a non-parametric change detection algorithm that can detect general distributional changes in heavy-tailed distributions. (ii)We then propose a truncation-based UCB-type bandit algorithm integrating the above regime change detection algorithm to minimize the regret of the non-stationary learning problem. (iii) Finally, we establish the regret bounds for the proposed bandit algorithm by characterizing the statistical properties of the general change detection algorithm, along with a novel regret analysis. |
Regular Papers |
inproceedings |
Proceedings of Machine Learning Research |
PMLR |
2640-3498 |
bhatt23b |
0 |
Piecewise Stationary Bandits under Risk Criteria |
4313 |
4335 |
4313-4335 |
4313 |
false |
Bhatt, Sujay and Fang, Guanhua and Li, Ping |
|
2023-04-11 |
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics |
206 |
inproceedings |
|