Loading…
Batched Learning in Generalized Linear Contextual Bandits With General Decision Sets
In real-world adaptive personalized decision making, due to physical and/or resource constraints, a decision maker often does not have the luxury of immediately incorporating the feedback from the previous individual into forming new policies for future individuals. This is an important aspect that...
Saved in:
Published in: | IEEE control systems letters 2022, Vol.6, p.37-42 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In real-world adaptive personalized decision making, due to physical and/or resource constraints, a decision maker often does not have the luxury of immediately incorporating the feedback from the previous individual into forming new policies for future individuals. This is an important aspect that has largely been abstracted away from the traditional online learning/decision making literature. In this letter, we study the problem of batched learning in generalized linear contextual bandits where the decision maker, unlike in traditional online learning, can only access feedback at the end of a limited number of batches, and when selecting actions within a batch, can only use information from prior batches. We provide a lower bound that characterizes the fundamental limit of performance in this setting and then give a UCB-based batched learning algorithm whose regret bound, obtained using a self-normalized martingale style analysis, nearly matches this lower bound. Our results provide a novel inquiry into generalized linear contextual bandits with arbitrary action sets, which include several bandits setting as special cases and thus shed light on batch-constrained decision making in general. |
---|---|
ISSN: | 2475-1456 2475-1456 |
DOI: | 10.1109/LCSYS.2020.3047601 |