Loading…

Error bounds for constant step-size Q-learning

We provide a bound on the first moment of the error in the Q-function estimate resulting from fixed step-size algorithms applied to finite state-space, discounted reward Markov decision problems. Motivated by Tsitsiklis’ proof for the decreasing step-size case, we decompose the Q-learning update equ...

Full description

Saved in:

Bibliographic Details
Published in:	Systems & control letters 2012-12, Vol.61 (12), p.1203-1208
Main Authors:	Beck, C.L., Srikant, R.
Format:	Article
Language:	English
Subjects:	[formula omitted]-learning Applied sciences Artificial intelligence Computer science control theory systems Decision theory. Utility theory Exact sciences and technology Learning and adaptive systems Markov decision processes Markov processes Mathematics Operational research and scientific management Operational research. Management science Probability and statistics Probability theory and stochastic processes Sciences and techniques of general use Stochastic approximation
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We provide a bound on the first moment of the error in the Q-function estimate resulting from fixed step-size algorithms applied to finite state-space, discounted reward Markov decision problems. Motivated by Tsitsiklis’ proof for the decreasing step-size case, we decompose the Q-learning update equations into a dynamical system driven by a noise sequence and another dynamical system whose state variable is the Q-learning error, i.e., the difference between the true Q-function and the estimate. A natural persistence of excitation condition allows us to sample the system periodically and derive a simple scalar difference equation from which the convergence properties and bounds on the error of the Q-learning algorithm can be derived.
ISSN:	0167-6911 1872-7956
DOI:	10.1016/j.sysconle.2012.08.014