Loading…

Policy Gradient Adaptive Dynamic Programming for Data-Based Optimal Control

The model-free optimal control problem of general discrete-time nonlinear systems is considered in this paper, and a data-based policy gradient adaptive dynamic programming (PGADP) algorithm is developed to design an adaptive optimal controller method. By using offline and online data rather than th...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on cybernetics 2017-10, Vol.47 (10), p.3341-3354
Main Authors: Biao Luo, Derong Liu, Huai-Ning Wu, Ding Wang, Lewis, Frank L.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The model-free optimal control problem of general discrete-time nonlinear systems is considered in this paper, and a data-based policy gradient adaptive dynamic programming (PGADP) algorithm is developed to design an adaptive optimal controller method. By using offline and online data rather than the mathematical system model, the PGADP algorithm improves control policy with a gradient descent scheme. The convergence of the PGADP algorithm is proved by demonstrating that the constructed Q-function sequence converges to the optimal Q-function. Based on the PGADP algorithm, the adaptive control method is developed with an actor-critic structure and the method of weighted residuals. Its convergence properties are analyzed, where the approximate Q-function converges to its optimum. Computer simulation results demonstrate the effectiveness of the PGADP-based adaptive control method.
ISSN:2168-2267
2168-2275
DOI:10.1109/TCYB.2016.2623859