Loading…
Dominant strategy truthful, deterministic multi-armed bandit mechanisms with logarithmic regret for sponsored search auctions
Stochastic multi-armed bandit (MAB) mechanisms are widely used in sponsored search auctions, crowdsourcing, online procurement, etc. Existing stochastic MAB mechanisms with a deterministic payment rule, proposed in the literature, necessarily suffer a regret of Ω ( T 2/3 ), where T is the number of...
Saved in:
Published in: | Applied intelligence (Dordrecht, Netherlands) Netherlands), 2022-02, Vol.52 (3), p.3209-3226 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Stochastic multi-armed bandit (MAB) mechanisms are widely used in sponsored search auctions, crowdsourcing, online procurement, etc. Existing stochastic MAB mechanisms with a deterministic payment rule, proposed in the literature, necessarily suffer a regret of
Ω
(
T
2/3
), where
T
is the number of time steps. This happens because the existing mechanisms consider the worst case scenario where the means of the agents’ stochastic rewards are separated by a very small amount that depends on
T
. We make, and, exploit the crucial observation that in most scenarios, the separation between the agents’ rewards is rarely a function of
T
. Moreover, in the case that the rewards of the arms are arbitrarily close, the regret contributed by such sub-optimal arms is minimal. Our idea is to allow the center to indicate the resolution,
Δ
, with which the agents must be distinguished. This immediately leads us to introduce the notion of
Δ
-Regret. Using sponsored search auctions as a concrete example (the same idea applies for other applications as well), we propose a dominant strategy incentive compatible (DSIC) and individually rational (IR), deterministic MAB mechanism, based on ideas from the Upper Confidence Bound (UCB) family of MAB algorithms. Remarkably, the proposed mechanism
Δ
-UCB achieves a
Δ
-regret of
O
(
log
T
)
for the case of sponsored search auctions. We first establish the results for single slot sponsored search auctions and then non-trivially extend the results to the case where multiple slots are to be allocated. |
---|---|
ISSN: | 0924-669X 1573-7497 |
DOI: | 10.1007/s10489-021-02387-2 |