Loading…

Batched Learning in Generalized Linear Contextual Bandits With General Decision Sets

In real-world adaptive personalized decision making, due to physical and/or resource constraints, a decision maker often does not have the luxury of immediately incorporating the feedback from the previous individual into forming new policies for future individuals. This is an important aspect that...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE control systems letters 2022, Vol.6, p.37-42
Main Authors:	Ren, Zhimei, Zhou, Zhengyuan, Kalagnanam, Jayant R.
Format:	Article
Language:	English
Subjects:	Adaptive control Decision making Medical treatment Random variables Sociology statistical learning Statistics Upper bound
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In real-world adaptive personalized decision making, due to physical and/or resource constraints, a decision maker often does not have the luxury of immediately incorporating the feedback from the previous individual into forming new policies for future individuals. This is an important aspect that has largely been abstracted away from the traditional online learning/decision making literature. In this letter, we study the problem of batched learning in generalized linear contextual bandits where the decision maker, unlike in traditional online learning, can only access feedback at the end of a limited number of batches, and when selecting actions within a batch, can only use information from prior batches. We provide a lower bound that characterizes the fundamental limit of performance in this setting and then give a UCB-based batched learning algorithm whose regret bound, obtained using a self-normalized martingale style analysis, nearly matches this lower bound. Our results provide a novel inquiry into generalized linear contextual bandits with arbitrary action sets, which include several bandits setting as special cases and thus shed light on batch-constrained decision making in general.
ISSN:	2475-1456 2475-1456
DOI:	10.1109/LCSYS.2020.3047601