Loading…

The conditional permutation test for independence while controlling for confounders

Summary   We propose a general new method, the conditional permutation test, for testing the conditional independence of variables X and Y given a potentially high dimensional random vector Z that may contain confounding factors. The test permutes entries of X non‐uniformly, to respect the existing...

Full description

Saved in:
Bibliographic Details
Published in:Journal of the Royal Statistical Society. Series B, Statistical methodology Statistical methodology, 2020-02, Vol.82 (1), p.175-197
Main Authors: Berrett, Thomas B., Wang, Yi, Barber, Rina Foygel, Samworth, Richard J.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Summary   We propose a general new method, the conditional permutation test, for testing the conditional independence of variables X and Y given a potentially high dimensional random vector Z that may contain confounding factors. The test permutes entries of X non‐uniformly, to respect the existing dependence between X and Z and thus to account for the presence of these confounders. Like the conditional randomization test of Candès and co‐workers in 2018, our test relies on the availability of an approximation to the distribution of X|Z—whereas their test uses this estimate to draw new X‐values, for our test we use this approximation to design an appropriate non‐uniform distribution on permutations of the X‐values already seen in the true data. We provide an efficient Markov chain Monte Carlo sampler for the implementation of our method and establish bounds on the type I error in terms of the error in the approximation of the conditional distribution of X|Z, finding that, for the worst‐case test statistic, the inflation in type I error of the conditional permutation test is no larger than that of the conditional randomization test. We validate these theoretical results with experiments on simulated data and on the Capital Bikeshare data set.
ISSN:1369-7412
1467-9868
DOI:10.1111/rssb.12340