Loading…

Integral Reinforcement Learning for Continuous-Time Input-Affine Nonlinear Systems With Simultaneous Invariant Explorations

This paper focuses on a class of reinforcement learning (RL) algorithms, named integral RL (I-RL), that solve continuous-time (CT) nonlinear optimal control problems with input-affine system dynamics. First, we extend the concepts of exploration, integral temporal difference, and invariant admissibi...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transaction on neural networks and learning systems 2015-05, Vol.26 (5), p.916-932
Main Authors:	Lee, Jae Young, Park, Jin Bae, Choi, Yoon Ho
Format:	Article
Language:	English
Subjects:	Adaptive optimal control continuous-time (CT) Convergence Equations exploration Heuristic algorithms Nonlinear systems Optimal control policy iteration (PI) Q-learning reinforcement learning (RL) Stability analysis
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c463t-f33b3120638ff6db1ede232963f375d043a1df8d8d1072ef1542c6ebd5a7657d3
cites	cdi_FETCH-LOGICAL-c463t-f33b3120638ff6db1ede232963f375d043a1df8d8d1072ef1542c6ebd5a7657d3
container_end_page	932
container_issue	5
container_start_page	916
container_title	IEEE transaction on neural networks and learning systems
container_volume	26
creator	Lee, Jae Young Park, Jin Bae Choi, Yoon Ho
description	This paper focuses on a class of reinforcement learning (RL) algorithms, named integral RL (I-RL), that solve continuous-time (CT) nonlinear optimal control problems with input-affine system dynamics. First, we extend the concepts of exploration, integral temporal difference, and invariant admissibility to the target CT nonlinear system that is governed by a control policy plus a probing signal called an exploration. Then, we show input-to-state stability (ISS) and invariant admissibility of the closed-loop systems with the policies generated by integral policy iteration (I-PI) or invariantly admissible PI (IA-PI) method. Based on these, three online I-RL algorithms named explorized I-PI and integral Q -learning I, II are proposed, all of which generate the same convergent sequences as I-PI and IA-PI under the required excitation condition on the exploration. All the proposed methods are partially or completely model free, and can simultaneously explore the state space in a stable manner during the online learning processes. ISS, invariant admissibility, and convergence properties of the proposed methods are also investigated, and related with these, we show the design principles of the exploration for safe learning. Neural-network-based implementation methods for the proposed schemes are also presented in this paper. Finally, several numerical simulations are carried out to verify the effectiveness of the proposed methods.
doi_str_mv	10.1109/TNNLS.2014.2328590
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TNNLS_2014_2328590</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6882245</ieee_id><sourcerecordid>1674201499</sourcerecordid><originalsourceid>FETCH-LOGICAL-c463t-f33b3120638ff6db1ede232963f375d043a1df8d8d1072ef1542c6ebd5a7657d3</originalsourceid><addsrcrecordid>eNo9kFtLwzAYhoMoTtQ_oCC59KYzhzZtL2V4GIwJbqJ3JWu-zEibziQVh3_ezM3l5gvJ874kD0IXlAwpJeXNfDqdzIaM0HTIOCuykhygE0YFSxgvisP9Pn8boHPvP0hcgmQiLY_RgGVUcJKTE_QztgGWTjb4GYzVnauhBRvwBKSzxi5xPMKjzgZj-673ydy0gMd21YfkVmtjAU8728QpHZ6tfYDW41cT3vHMtH0TpIWYioEv6YyMvXffq6ZzMpjO-jN0pGXj4Xw3T9HL_d189JhMnh7Go9tJUqeCh0RzvuCUEcELrYVaUFAQv1wKrnmeKZJySZUuVKEoyRlomqWsFrBQmcxFlit-iq63vSvXffbgQ9UaX0PTbF9XUZGnG5FlGVG2RWvXee9AVytnWunWFSXVxnv1573a4NXOewxd7fr7RQtqH_m3HIHLLWAAYH8tioKxNOO_qb6JfA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1674201499</pqid></control><display><type>article</type><title>Integral Reinforcement Learning for Continuous-Time Input-Affine Nonlinear Systems With Simultaneous Invariant Explorations</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Lee, Jae Young ; Park, Jin Bae ; Choi, Yoon Ho</creator><creatorcontrib>Lee, Jae Young ; Park, Jin Bae ; Choi, Yoon Ho</creatorcontrib><description>This paper focuses on a class of reinforcement learning (RL) algorithms, named integral RL (I-RL), that solve continuous-time (CT) nonlinear optimal control problems with input-affine system dynamics. First, we extend the concepts of exploration, integral temporal difference, and invariant admissibility to the target CT nonlinear system that is governed by a control policy plus a probing signal called an exploration. Then, we show input-to-state stability (ISS) and invariant admissibility of the closed-loop systems with the policies generated by integral policy iteration (I-PI) or invariantly admissible PI (IA-PI) method. Based on these, three online I-RL algorithms named explorized I-PI and integral Q -learning I, II are proposed, all of which generate the same convergent sequences as I-PI and IA-PI under the required excitation condition on the exploration. All the proposed methods are partially or completely model free, and can simultaneously explore the state space in a stable manner during the online learning processes. ISS, invariant admissibility, and convergence properties of the proposed methods are also investigated, and related with these, we show the design principles of the exploration for safe learning. Neural-network-based implementation methods for the proposed schemes are also presented in this paper. Finally, several numerical simulations are carried out to verify the effectiveness of the proposed methods.</description><identifier>ISSN: 2162-237X</identifier><identifier>EISSN: 2162-2388</identifier><identifier>DOI: 10.1109/TNNLS.2014.2328590</identifier><identifier>PMID: 25163070</identifier><identifier>CODEN: ITNNAL</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Adaptive optimal control ; continuous-time (CT) ; Convergence ; Equations ; exploration ; Heuristic algorithms ; Nonlinear systems ; Optimal control ; policy iteration (PI) ; Q-learning ; reinforcement learning (RL) ; Stability analysis</subject><ispartof>IEEE transaction on neural networks and learning systems, 2015-05, Vol.26 (5), p.916-932</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c463t-f33b3120638ff6db1ede232963f375d043a1df8d8d1072ef1542c6ebd5a7657d3</citedby><cites>FETCH-LOGICAL-c463t-f33b3120638ff6db1ede232963f375d043a1df8d8d1072ef1542c6ebd5a7657d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6882245$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/25163070$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Lee, Jae Young</creatorcontrib><creatorcontrib>Park, Jin Bae</creatorcontrib><creatorcontrib>Choi, Yoon Ho</creatorcontrib><title>Integral Reinforcement Learning for Continuous-Time Input-Affine Nonlinear Systems With Simultaneous Invariant Explorations</title><title>IEEE transaction on neural networks and learning systems</title><addtitle>TNNLS</addtitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><description>This paper focuses on a class of reinforcement learning (RL) algorithms, named integral RL (I-RL), that solve continuous-time (CT) nonlinear optimal control problems with input-affine system dynamics. First, we extend the concepts of exploration, integral temporal difference, and invariant admissibility to the target CT nonlinear system that is governed by a control policy plus a probing signal called an exploration. Then, we show input-to-state stability (ISS) and invariant admissibility of the closed-loop systems with the policies generated by integral policy iteration (I-PI) or invariantly admissible PI (IA-PI) method. Based on these, three online I-RL algorithms named explorized I-PI and integral Q -learning I, II are proposed, all of which generate the same convergent sequences as I-PI and IA-PI under the required excitation condition on the exploration. All the proposed methods are partially or completely model free, and can simultaneously explore the state space in a stable manner during the online learning processes. ISS, invariant admissibility, and convergence properties of the proposed methods are also investigated, and related with these, we show the design principles of the exploration for safe learning. Neural-network-based implementation methods for the proposed schemes are also presented in this paper. Finally, several numerical simulations are carried out to verify the effectiveness of the proposed methods.</description><subject>Adaptive optimal control</subject><subject>continuous-time (CT)</subject><subject>Convergence</subject><subject>Equations</subject><subject>exploration</subject><subject>Heuristic algorithms</subject><subject>Nonlinear systems</subject><subject>Optimal control</subject><subject>policy iteration (PI)</subject><subject>Q-learning</subject><subject>reinforcement learning (RL)</subject><subject>Stability analysis</subject><issn>2162-237X</issn><issn>2162-2388</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNo9kFtLwzAYhoMoTtQ_oCC59KYzhzZtL2V4GIwJbqJ3JWu-zEibziQVh3_ezM3l5gvJ874kD0IXlAwpJeXNfDqdzIaM0HTIOCuykhygE0YFSxgvisP9Pn8boHPvP0hcgmQiLY_RgGVUcJKTE_QztgGWTjb4GYzVnauhBRvwBKSzxi5xPMKjzgZj-673ydy0gMd21YfkVmtjAU8728QpHZ6tfYDW41cT3vHMtH0TpIWYioEv6YyMvXffq6ZzMpjO-jN0pGXj4Xw3T9HL_d189JhMnh7Go9tJUqeCh0RzvuCUEcELrYVaUFAQv1wKrnmeKZJySZUuVKEoyRlomqWsFrBQmcxFlit-iq63vSvXffbgQ9UaX0PTbF9XUZGnG5FlGVG2RWvXee9AVytnWunWFSXVxnv1573a4NXOewxd7fr7RQtqH_m3HIHLLWAAYH8tioKxNOO_qb6JfA</recordid><startdate>20150501</startdate><enddate>20150501</enddate><creator>Lee, Jae Young</creator><creator>Park, Jin Bae</creator><creator>Choi, Yoon Ho</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>20150501</creationdate><title>Integral Reinforcement Learning for Continuous-Time Input-Affine Nonlinear Systems With Simultaneous Invariant Explorations</title><author>Lee, Jae Young ; Park, Jin Bae ; Choi, Yoon Ho</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c463t-f33b3120638ff6db1ede232963f375d043a1df8d8d1072ef1542c6ebd5a7657d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Adaptive optimal control</topic><topic>continuous-time (CT)</topic><topic>Convergence</topic><topic>Equations</topic><topic>exploration</topic><topic>Heuristic algorithms</topic><topic>Nonlinear systems</topic><topic>Optimal control</topic><topic>policy iteration (PI)</topic><topic>Q-learning</topic><topic>reinforcement learning (RL)</topic><topic>Stability analysis</topic><toplevel>online_resources</toplevel><creatorcontrib>Lee, Jae Young</creatorcontrib><creatorcontrib>Park, Jin Bae</creatorcontrib><creatorcontrib>Choi, Yoon Ho</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transaction on neural networks and learning systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lee, Jae Young</au><au>Park, Jin Bae</au><au>Choi, Yoon Ho</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Integral Reinforcement Learning for Continuous-Time Input-Affine Nonlinear Systems With Simultaneous Invariant Explorations</atitle><jtitle>IEEE transaction on neural networks and learning systems</jtitle><stitle>TNNLS</stitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><date>2015-05-01</date><risdate>2015</risdate><volume>26</volume><issue>5</issue><spage>916</spage><epage>932</epage><pages>916-932</pages><issn>2162-237X</issn><eissn>2162-2388</eissn><coden>ITNNAL</coden><abstract>This paper focuses on a class of reinforcement learning (RL) algorithms, named integral RL (I-RL), that solve continuous-time (CT) nonlinear optimal control problems with input-affine system dynamics. First, we extend the concepts of exploration, integral temporal difference, and invariant admissibility to the target CT nonlinear system that is governed by a control policy plus a probing signal called an exploration. Then, we show input-to-state stability (ISS) and invariant admissibility of the closed-loop systems with the policies generated by integral policy iteration (I-PI) or invariantly admissible PI (IA-PI) method. Based on these, three online I-RL algorithms named explorized I-PI and integral Q -learning I, II are proposed, all of which generate the same convergent sequences as I-PI and IA-PI under the required excitation condition on the exploration. All the proposed methods are partially or completely model free, and can simultaneously explore the state space in a stable manner during the online learning processes. ISS, invariant admissibility, and convergence properties of the proposed methods are also investigated, and related with these, we show the design principles of the exploration for safe learning. Neural-network-based implementation methods for the proposed schemes are also presented in this paper. Finally, several numerical simulations are carried out to verify the effectiveness of the proposed methods.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>25163070</pmid><doi>10.1109/TNNLS.2014.2328590</doi><tpages>17</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 2162-237X
ispartof	IEEE transaction on neural networks and learning systems, 2015-05, Vol.26 (5), p.916-932
issn	2162-237X 2162-2388
language	eng
recordid	cdi_crossref_primary_10_1109_TNNLS_2014_2328590
source	IEEE Electronic Library (IEL) Journals
subjects	Adaptive optimal control continuous-time (CT) Convergence Equations exploration Heuristic algorithms Nonlinear systems Optimal control policy iteration (PI) Q-learning reinforcement learning (RL) Stability analysis
title	Integral Reinforcement Learning for Continuous-Time Input-Affine Nonlinear Systems With Simultaneous Invariant Explorations
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T16%3A03%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Integral%20Reinforcement%20Learning%20for%20Continuous-Time%20Input-Affine%20Nonlinear%20Systems%20With%20Simultaneous%20Invariant%20Explorations&rft.jtitle=IEEE%20transaction%20on%20neural%20networks%20and%20learning%20systems&rft.au=Lee,%20Jae%20Young&rft.date=2015-05-01&rft.volume=26&rft.issue=5&rft.spage=916&rft.epage=932&rft.pages=916-932&rft.issn=2162-237X&rft.eissn=2162-2388&rft.coden=ITNNAL&rft_id=info:doi/10.1109/TNNLS.2014.2328590&rft_dat=%3Cproquest_cross%3E1674201499%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c463t-f33b3120638ff6db1ede232963f375d043a1df8d8d1072ef1542c6ebd5a7657d3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1674201499&rft_id=info:pmid/25163070&rft_ieee_id=6882245&rfr_iscdi=true