Loading…

Always Valid Inference: Continuous Monitoring of A/B Tests

A/B tests are typically analyzed via frequentist p -values and confidence intervals, but these inferences are wholly unreliable if users endogenously choose samples sizes by continuously monitoring their tests. We define always valid p -values and confidence intervals that let users try to take adva...

Full description

Saved in:

Bibliographic Details
Published in:	Operations research 2022-05, Vol.70 (3), p.1806-1821
Main Authors:	Johari, Ramesh, Koomen, Pete, Pekelis, Leonid, Walsh, David
Format:	Article
Language:	English
Subjects:	A/B testing Confidence intervals Decision making Hypotheses Hypothesis testing Monitoring Monitoring systems multiple hypothesis testing Operations research sequential hypothesis testing Statistical analysis Statistical inference Stochastic Models values
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	A/B tests are typically analyzed via frequentist p -values and confidence intervals, but these inferences are wholly unreliable if users endogenously choose samples sizes by continuously monitoring their tests. We define always valid p -values and confidence intervals that let users try to take advantage of data as fast as it becomes available, providing valid statistical inference whenever they make their decision. Always valid inference can be interpreted as a natural interface for a sequential hypothesis test, which empowers users to implement a modified test tailored to them. In particular, we show in an appropriate sense that the measures we develop trade off sample size and power efficiently, despite a lack of prior knowledge of the user’s relative preference between these two goals. We also use always valid p -values to obtain multiple hypothesis testing control in the sequential context. Our methodology has been implemented in a large-scale commercial A/B testing platform to analyze hundreds of thousands of experiments to date.
ISSN:	0030-364X 1526-5463
DOI:	10.1287/opre.2021.2135