Loading…

Value iteration for streaming data on a continuous space with gradient method in an RKHS

The classical theory of reinforcement learning focused on the tabular setting when states and actions are finite, or for linear representation of the value function in a finite-dimensional approximation. Establishing theory on general continuous state and action space requires a careful treatment of...

Full description

Saved in:
Bibliographic Details
Published in:Neural networks 2023-09, Vol.166, p.437-445
Main Authors: Liu, Jiamin, Xu, Wangli, Wang, Yue, Lian, Heng
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The classical theory of reinforcement learning focused on the tabular setting when states and actions are finite, or for linear representation of the value function in a finite-dimensional approximation. Establishing theory on general continuous state and action space requires a careful treatment of complexity theory of appropriately chosen function spaces and the iterative update of the value function when stochastic gradient descent (SGD) is used. For the classical prediction problem in reinforcement learning based on i.i.d. streaming data in the framework of reproducing kernel Hilbert spaces, we establish polynomial sample complexity taking into account the smoothness of the value function. In particular, we prove that the gradient descent algorithm efficiently computes the value function with appropriately chosen step sizes, with a convergence rate that can be close to 1/N, which is the best possible rate for parametric SGD. The advantages of using the gradient descent algorithm include its computational convenience and it can naturally deal with streaming data.
ISSN:0893-6080
1879-2782
DOI:10.1016/j.neunet.2023.07.036