Loading…

Quantum Heavy-tailed Bandits

In this paper, we study multi-armed bandits (MAB) and stochastic linear bandits (SLB) with heavy-tailed rewards and quantum reward oracle. Unlike the previous work on quantum bandits that assumes bounded/sub-Gaussian distributions for rewards, here we investigate the quantum bandits problem under a...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2023-01
Main Authors:	Wu, Yulian, Guan, Chaowen, Aggarwal, Vaneet, Wang, Di
Format:	Article
Language:	English
Subjects:	Algorithms
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In this paper, we study multi-armed bandits (MAB) and stochastic linear bandits (SLB) with heavy-tailed rewards and quantum reward oracle. Unlike the previous work on quantum bandits that assumes bounded/sub-Gaussian distributions for rewards, here we investigate the quantum bandits problem under a weaker assumption that the distributions of rewards only have bounded \((1+v)\)-th moment for some \(v\in (0,1]\). In order to achieve regret improvements for heavy-tailed bandits, we first propose a new quantum mean estimator for heavy-tailed distributions, which is based on the Quantum Monte Carlo Mean Estimator and achieves a quadratic improvement of estimation error compared to the classical one. Based on our quantum mean estimator, we focus on quantum heavy-tailed MAB and SLB and propose quantum algorithms based on the Upper Confidence Bound (UCB) framework for both problems with \(\Tilde{O}(T^{\frac{1-v}{1+v}})\) regrets, polynomially improving the dependence in terms of \(T\) as compared to classical (near) optimal regrets of \(\Tilde{O}(T^{\frac{1}{1+v}})\), where \(T\) is the number of rounds. Finally, experiments also support our theoretical results and show the effectiveness of our proposed methods.
ISSN:	2331-8422