Loading…

Good arm identification via bandit feedback

We consider a novel stochastic multi-armed bandit problem called good arm identification (GAI), where a good arm is defined as an arm with expected reward greater than or equal to a given threshold. GAI is a pure-exploration problem in which a single agent repeats a process of outputting an arm as s...

Full description

Saved in:

Bibliographic Details
Published in:	Machine learning 2019-05, Vol.108 (5), p.721-745
Main Authors:	Kano, Hideaki, Honda, Junya, Sakamaki, Kentaro, Matsuura, Kentaro, Nakamura, Atsuyoshi, Sugiyama, Masashi
Format:	Article
Language:	English
Subjects:	Algorithms Arthritis Artificial Intelligence Complexity Computer Science Control Exploration Identification Lower bounds Machine Learning Mechatronics Multi-armed bandit problems Natural Language Processing (NLP) Robotics Simulation and Modeling Special Issue of the ACML 2018 Journal Track
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We consider a novel stochastic multi-armed bandit problem called good arm identification (GAI), where a good arm is defined as an arm with expected reward greater than or equal to a given threshold. GAI is a pure-exploration problem in which a single agent repeats a process of outputting an arm as soon as it is identified as a good one before confirming the other arms are actually not good. The objective of GAI is to minimize the number of samples for each process. We find that GAI faces a new kind of dilemma, the exploration-exploitation dilemma of confidence , which is different from the best arm identification. As a result, an efficient design of algorithms for GAI is quite different from that for the best arm identification. We derive a lower bound on the sample complexity of GAI that is tight up to the logarithmic factor O ( log 1 δ ) for acceptance error rate δ . We also develop an algorithm whose sample complexity almost matches the lower bound. We also confirm experimentally that our proposed algorithm outperforms naive algorithms in synthetic settings based on a conventional bandit problem and clinical trial researches for rheumatoid arthritis.
ISSN:	0885-6125 1573-0565
DOI:	10.1007/s10994-019-05784-4