Loading…
Good arm identification via bandit feedback
We consider a novel stochastic multi-armed bandit problem called good arm identification (GAI), where a good arm is defined as an arm with expected reward greater than or equal to a given threshold. GAI is a pure-exploration problem in which a single agent repeats a process of outputting an arm as s...
Saved in:
Published in: | Machine learning 2019-05, Vol.108 (5), p.721-745 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | We consider a novel stochastic multi-armed bandit problem called
good arm identification
(GAI), where a good arm is defined as an arm with expected reward greater than or equal to a given threshold. GAI is a pure-exploration problem in which a single agent repeats a process of outputting an arm as soon as it is identified as a good one before confirming the other arms are actually not good. The objective of GAI is to minimize the number of samples for each process. We find that GAI faces a new kind of dilemma, the
exploration-exploitation dilemma of confidence
, which is different from the best arm identification. As a result, an efficient design of algorithms for GAI is quite different from that for the best arm identification. We derive a lower bound on the sample complexity of GAI that is tight up to the logarithmic factor
O
(
log
1
δ
)
for acceptance error rate
δ
. We also develop an algorithm whose sample complexity almost matches the lower bound. We also confirm experimentally that our proposed algorithm outperforms naive algorithms in synthetic settings based on a conventional bandit problem and clinical trial researches for rheumatoid arthritis. |
---|---|
ISSN: | 0885-6125 1573-0565 |
DOI: | 10.1007/s10994-019-05784-4 |