Loading…

Adaptive Observation-Based Efficient Reinforcement Learning for Uncertain Systems

This article develops an adaptive observation-based efficient reinforcement learning (RL) approach for systems with uncertain drift dynamics. A novel concurrent learning adaptive extended observer (CL-AEO) is first designed to jointly estimate the system state and parameter. This observer has a two-...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transaction on neural networks and learning systems 2022-10, Vol.33 (10), p.5492-5503
Main Authors:	Ran, Maopeng, Xie, Lihua
Format:	Article
Language:	English
Subjects:	Adaptation models Adaptive observer Adaptive systems concurrent learning (CL) Convergence Data models Estimation Excitation Learning Observational learning Observers Optimal control Parameter estimation Reinforcement reinforcement learning (RL) Theoretical analysis Uncertain systems
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This article develops an adaptive observation-based efficient reinforcement learning (RL) approach for systems with uncertain drift dynamics. A novel concurrent learning adaptive extended observer (CL-AEO) is first designed to jointly estimate the system state and parameter. This observer has a two-time-scale structure and does not require any additional numerical techniques to calculate the state derivative information. The idea of concurrent learning (CL) is leveraged to use the recorded data, which leads to a relaxed verifiable excitation condition for the convergence of parameter estimation. Based on the estimated state and parameter provided by the CL-AEO, a simulation of experience-based RL scheme is developed to online approximate the optimal control policy. Rigorous theoretical analysis is given to show that the practical convergence of the system state to the origin and the developed policy to the ideal optimal policy can be achieved without the persistence of excitation (PE) condition. Finally, the effectiveness and superiority of the developed methodology are demonstrated via comparative simulations.
ISSN:	2162-237X 2162-2388
DOI:	10.1109/TNNLS.2021.3070852