Loading…
Policy Gradient Adaptive Dynamic Programming for Data-Based Optimal Control
The model-free optimal control problem of general discrete-time nonlinear systems is considered in this paper, and a data-based policy gradient adaptive dynamic programming (PGADP) algorithm is developed to design an adaptive optimal controller method. By using offline and online data rather than th...
Saved in:
Published in: | IEEE transactions on cybernetics 2017-10, Vol.47 (10), p.3341-3354 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The model-free optimal control problem of general discrete-time nonlinear systems is considered in this paper, and a data-based policy gradient adaptive dynamic programming (PGADP) algorithm is developed to design an adaptive optimal controller method. By using offline and online data rather than the mathematical system model, the PGADP algorithm improves control policy with a gradient descent scheme. The convergence of the PGADP algorithm is proved by demonstrating that the constructed Q-function sequence converges to the optimal Q-function. Based on the PGADP algorithm, the adaptive control method is developed with an actor-critic structure and the method of weighted residuals. Its convergence properties are analyzed, where the approximate Q-function converges to its optimum. Computer simulation results demonstrate the effectiveness of the PGADP-based adaptive control method. |
---|---|
ISSN: | 2168-2267 2168-2275 |
DOI: | 10.1109/TCYB.2016.2623859 |