Loading…
Value iteration for streaming data on a continuous space with gradient method in an RKHS
The classical theory of reinforcement learning focused on the tabular setting when states and actions are finite, or for linear representation of the value function in a finite-dimensional approximation. Establishing theory on general continuous state and action space requires a careful treatment of...
Saved in:
Published in: | Neural networks 2023-09, Vol.166, p.437-445 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The classical theory of reinforcement learning focused on the tabular setting when states and actions are finite, or for linear representation of the value function in a finite-dimensional approximation. Establishing theory on general continuous state and action space requires a careful treatment of complexity theory of appropriately chosen function spaces and the iterative update of the value function when stochastic gradient descent (SGD) is used. For the classical prediction problem in reinforcement learning based on i.i.d. streaming data in the framework of reproducing kernel Hilbert spaces, we establish polynomial sample complexity taking into account the smoothness of the value function. In particular, we prove that the gradient descent algorithm efficiently computes the value function with appropriately chosen step sizes, with a convergence rate that can be close to 1/N, which is the best possible rate for parametric SGD. The advantages of using the gradient descent algorithm include its computational convenience and it can naturally deal with streaming data. |
---|---|
ISSN: | 0893-6080 1879-2782 |
DOI: | 10.1016/j.neunet.2023.07.036 |