Loading…
Delaytron: Efficient Learning of Multiclass Classifiers with Delayed Bandit Feedbacks
This paper presents an online algorithm Delaytron for learning multiclass classifiers using delayed bandit feedback. At the t-th round, the algorithm observes an example \mathbf{x}_{t} and predicts a label \tilde{y}_{t} and receives the bandit feedback II [\tilde{y}_{t}=y_{t}] only d_{t} rounds late...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper presents an online algorithm Delaytron for learning multiclass classifiers using delayed bandit feedback. At the t-th round, the algorithm observes an example \mathbf{x}_{t} and predicts a label \tilde{y}_{t} and receives the bandit feedback II [\tilde{y}_{t}=y_{t}] only d_{t} rounds later. When t+d_{t} > T , we consider the feedback for the t-th round missing. The sequence of feedback delays \{d_{t}\}_{t=1}^{T} is unknown to the algorithm. For the K -class classification problem, we show that the proposed algorithm achieves regret of \mathcal{O}\left(\sqrt{\frac{2K}{\gamma}\left[\frac{T}{2}+(2+\frac{L^{2}}{R^{2}\Vert \mathbf{W}\Vert_{F}^{2}})\sum_{t-1}^{T}d_{t}\right]}\right) when the loss for each missing sample is upper bounded by L . In the case when the loss for missing samples is not upper-bounded, the regret achieved by Delaytron is \mathcal{O}\left(\sqrt{\frac{2K}{\gamma}\left[\frac{T}{2}+2\Sigma_{t=1}^{T}d_{t}+\vert \mathcal{M} \vert T\right]}\right) where \mathcal{M} is the set of missing samples in T rounds. These bounds were achieved with a constant step size which requires the knowledge of T and \sum_{t=1}^{T}d_{t} . When T and \sum_{t=1}^{T}d_{t} are unknown, we use a doubling trick for online learning and propose Adaptive Delaytron. We show that Adaptive Delaytron achieves a regret bound of \mathcal{O}\left(\sqrt{T+\sum_{t=1}^{T}d_{t}}\right) . We experimentally show that the proposed approach can learn efficient classifiers even with delayed bandit feedback, and the accuracy does not degrade much due to delays in feedback. |
---|---|
ISSN: | 2161-4407 |
DOI: | 10.1109/IJCNN54540.2023.10191245 |