Loading…

Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

We study stochastic approximation procedures for approximately solving a d -dimensional linear fixed-point equation based on observing a trajectory of length n from an ergodic Markov chain. We first exhibit a non-asymptotic bound of the order t_{\mathrm{mix}}\frac{d}{n} on the squared error of the...

Full description

Saved in:

Bibliographic Details
Published in:	Mathematical statistics and learning (Online) 2024-03, Vol.7 (1), p.41-153
Main Authors:	Mou, Wenlong, Pananjady, Ashwin, Wainwright, Martin J., Bartlett, Peter L.
Format:	Article
Language:	English
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We study stochastic approximation procedures for approximately solving a d -dimensional linear fixed-point equation based on observing a trajectory of length n from an ergodic Markov chain. We first exhibit a non-asymptotic bound of the order t_{\mathrm{mix}}\frac{d}{n} on the squared error of the last iterate of a standard scheme, where t_{\mathrm{mix}} is a mixing time. We then prove a non-asymptotic instance-dependent bound on a suitably averaged sequence of iterates, with a leading term that matches the local asymptotic minimax limit, including sharp dependence on the parameters (d, t_{\mathrm{mix}}) in the higher-order terms. We complement these upper bounds with a non-asymptotic minimax lower bound that establishes the instance-optimality of the averaged SA estimator. We derive corollaries of these results for policy evaluation with Markov noise—covering the \mathrm{TD}(\lambda) family of algorithms for all \lambda \in [0, 1) —and linear autoregressive models. Our instance-dependent characterizations open the door to the design of fine-grained model selection procedures for hyperparameter tuning (e.g., choosing the value of \lambda when running the \mathrm{TD}(\lambda) algorithm).
ISSN:	2520-2316 2520-2324
DOI:	10.4171/msl/44