Loading…

Functional Sequential Treatment Allocation

Consider a setting in which a policy maker assigns subjects to treatments, observing each outcome before the next subject arrives. Initially, it is unknown which treatment is best, but the sequential nature of the problem permits learning about the effectiveness of the treatments. While the multi-ar...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of the American Statistical Association 2022-09, Vol.117 (539), p.1311-1323
Main Authors:	Kock, Anders Bredahl, Preinerstorfer, David, Veliyev, Bezirgen
Format:	Article
Language:	English
Subjects:	Algorithms Decision making Decision theory Distributional characteristics Effectiveness Inequality Machine learning Minimax optimal expected regret Minimax technique Multi-armed bandit problems Multi-armed bandits Policies Poverty Randomized controlled trials Regression analysis Regret Robustness Sequential treatment allocation Socioeconomic factors Statistical methods Statistics Welfare
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Consider a setting in which a policy maker assigns subjects to treatments, observing each outcome before the next subject arrives. Initially, it is unknown which treatment is best, but the sequential nature of the problem permits learning about the effectiveness of the treatments. While the multi-armed-bandit literature has shed much light on the situation when the policy maker compares the effectiveness of the treatments through their mean, much less is known about other targets. This is restrictive, because a cautious decision maker may prefer to target a robust location measure such as a quantile or a trimmed mean. Furthermore, socio-economic decision making often requires targeting purpose specific characteristics of the outcome distribution, such as its inherent degree of inequality, welfare or poverty. In the present article, we introduce and study sequential learning algorithms when the distributional characteristic of interest is a general functional of the outcome distribution. Minimax expected regret optimality results are obtained within the subclass of explore-then-commit policies, and for the unrestricted class of all policies. Supplementary materials for this article are available online.
ISSN:	0162-1459 1537-274X
DOI:	10.1080/01621459.2020.1851236