Loading…

Policy Iteration-Based Learning Design for Linear Continuous-Time Systems Under Initial Stabilizing OPFB Policy

Policy iteration (PI), an iterative method in reinforcement learning, has the merit of interactions with a little-known environment to learn a decision law through policy evaluation and improvement. However, the existing PI-based results for output-feedback (OPFB) continuous-time systems relied heav...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on cybernetics 2024-11, Vol.54 (11), p.6707-6718
Main Authors: Zhang, Chengye, Chen, Ci, Lewis, Frank L., Xie, Shengli
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Policy iteration (PI), an iterative method in reinforcement learning, has the merit of interactions with a little-known environment to learn a decision law through policy evaluation and improvement. However, the existing PI-based results for output-feedback (OPFB) continuous-time systems relied heavily on an initial stabilizing full state-feedback (FSFB) policy. It thus raises the question of violating the OPFB principle. This article addresses such a question and establishes the PI under an initial stabilizing OPFB policy. We prove that an off-policy Bellman equation can transform any OPFB policy into an FSFB policy. Based on this transformation property, we revise the traditional PI by appending an additional iteration, which turns out to be efficient in approximating the optimal control under the initial OPFB policy. We show the effectiveness of the proposed learning methods through theoretical analysis and a case study.
ISSN:2168-2267
2168-2275
2168-2275
DOI:10.1109/TCYB.2024.3418190