Loading…

Linear Quadratic Control Using Model-Free Reinforcement Learning

In this article, we consider linear quadratic (LQ) control problem with process and measurement noises. We analyze the LQ problem in terms of the average cost and the structure of the value function. We assume that the dynamics of the linear system is unknown and only noisy measurements of the state...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on automatic control 2023-02, Vol.68 (2), p.737-752
Main Authors:	Yaghmaie, Farnaz Adib, Gustafsson, Fredrik, Ljung, Lennart
Format:	Article
Language:	English
Subjects:	Adaptation models Adaptive control Algorithms Benchmarks Cost analysis Costs Dynamical systems Exact solutions Heuristic algorithms Iterative algorithms Iterative methods Linear quadratic (LQ) control Machine learning Noise measurement Optimal control Process control reinforcement learning (RL) Stability analysis State variable
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c329t-74114c11eaf2067c96958fdd7ac59e4392c6548e34b42ffc3a2e8b99599b49093
cites	cdi_FETCH-LOGICAL-c329t-74114c11eaf2067c96958fdd7ac59e4392c6548e34b42ffc3a2e8b99599b49093
container_end_page	752
container_issue	2
container_start_page	737
container_title	IEEE transactions on automatic control
container_volume	68
creator	Yaghmaie, Farnaz Adib Gustafsson, Fredrik Ljung, Lennart
description	In this article, we consider linear quadratic (LQ) control problem with process and measurement noises. We analyze the LQ problem in terms of the average cost and the structure of the value function. We assume that the dynamics of the linear system is unknown and only noisy measurements of the state variable are available. Using noisy measurements of the state variable, we propose two model-free iterative algorithms to solve the LQ problem. The proposed algorithms are variants of policy iteration routine where the policy is greedy with respect to the average of all previous iterations. We rigorously analyze the properties of the proposed algorithms, including stability of the generated controllers and convergence. We analyze the effect of measurement noise on the performance of the proposed algorithms, the classical off-policy, and the classical Q-learning routines. We also investigate a model-building approach, inspired by adaptive control, where a model of the dynamical system is estimated and the optimal control problem is solved assuming that the estimated model is the true model. We use a benchmark to evaluate and compare our proposed algorithms with the classical off-policy, the classical Q-learning, and the policy gradient. We show that our model-building approach performs nearly identical to the analytical solution and our proposed policy iteration-based algorithms outperform the classical off-policy and the classical Q-learning algorithms on this benchmark but do not outperform the model-building approach.
doi_str_mv	10.1109/TAC.2022.3145632
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2771532802</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9691800</ieee_id><sourcerecordid>2771532802</sourcerecordid><originalsourceid>FETCH-LOGICAL-c329t-74114c11eaf2067c96958fdd7ac59e4392c6548e34b42ffc3a2e8b99599b49093</originalsourceid><addsrcrecordid>eNo9kE1Lw0AQhhdRsFbvgpeA59T9TLI3S7UqRERpvS6bzaRsSbN1N0H8925J6WkYeN6XmQehW4JnhGD5sJovZhRTOmOEi4zRMzQhQhQpFZSdownGpEglLbJLdBXCNq4Z52SCHkvbgfbJ56Brr3trkoXreu_aZB1st0neXQ1tuvQAyRfYrnHewA66PiljqovENbpodBvg5jinaL18Xi1e0_Lj5W0xL1PDqOzTnBPCDSGgG4qz3MhMiqKp61wbIYEzSU0meAGMV5w2jWGaQlFJKaSsuMSSTVE69oZf2A-V2nu70_5POW3Vk_2eK-c3qrWDIpJKQSJ_P_J7734GCL3ausF38URF85wIRgtMI4VHyngXgofm1EuwOnhV0as6eFVHrzFyN0YsAJzw-A8pMGb_1ThyLQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2771532802</pqid></control><display><type>article</type><title>Linear Quadratic Control Using Model-Free Reinforcement Learning</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Yaghmaie, Farnaz Adib ; Gustafsson, Fredrik ; Ljung, Lennart</creator><creatorcontrib>Yaghmaie, Farnaz Adib ; Gustafsson, Fredrik ; Ljung, Lennart</creatorcontrib><description><![CDATA[In this article, we consider linear quadratic (LQ) control problem with process and measurement noises. We analyze the LQ problem in terms of the average cost and the structure of the value function. We assume that the dynamics of the linear system is unknown and only noisy measurements of the state variable are available. Using noisy measurements of the state variable, we propose two model-free iterative algorithms to solve the LQ problem. The proposed algorithms are variants of policy iteration routine where the policy is greedy with respect to the average of all previous iterations. We rigorously analyze the properties of the proposed algorithms, including stability of the generated controllers and convergence. We analyze the effect of measurement noise on the performance of the proposed algorithms, the classical off-policy, and the classical <inline-formula><tex-math notation="LaTeX">Q</tex-math></inline-formula>-learning routines. We also investigate a model-building approach, inspired by adaptive control, where a model of the dynamical system is estimated and the optimal control problem is solved assuming that the estimated model is the true model. We use a benchmark to evaluate and compare our proposed algorithms with the classical off-policy, the classical <inline-formula><tex-math notation="LaTeX">Q</tex-math></inline-formula>-learning, and the policy gradient. We show that our model-building approach performs nearly identical to the analytical solution and our proposed policy iteration-based algorithms outperform the classical off-policy and the classical <inline-formula><tex-math notation="LaTeX">Q</tex-math></inline-formula>-learning algorithms on this benchmark but do not outperform the model-building approach.]]></description><identifier>ISSN: 0018-9286</identifier><identifier>ISSN: 1558-2523</identifier><identifier>EISSN: 1558-2523</identifier><identifier>DOI: 10.1109/TAC.2022.3145632</identifier><identifier>CODEN: IETAA9</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Adaptation models ; Adaptive control ; Algorithms ; Benchmarks ; Cost analysis ; Costs ; Dynamical systems ; Exact solutions ; Heuristic algorithms ; Iterative algorithms ; Iterative methods ; Linear quadratic (LQ) control ; Machine learning ; Noise measurement ; Optimal control ; Process control ; reinforcement learning (RL) ; Stability analysis ; State variable</subject><ispartof>IEEE transactions on automatic control, 2023-02, Vol.68 (2), p.737-752</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c329t-74114c11eaf2067c96958fdd7ac59e4392c6548e34b42ffc3a2e8b99599b49093</citedby><cites>FETCH-LOGICAL-c329t-74114c11eaf2067c96958fdd7ac59e4392c6548e34b42ffc3a2e8b99599b49093</cites><orcidid>0000-0002-6665-5881 ; 0000-0003-3270-171X ; 0000-0003-4881-8955</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9691800$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>230,314,780,784,885,27924,27925,54796</link.rule.ids><backlink>$$Uhttps://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-192951$$DView record from Swedish Publication Index$$Hfree_for_read</backlink></links><search><creatorcontrib>Yaghmaie, Farnaz Adib</creatorcontrib><creatorcontrib>Gustafsson, Fredrik</creatorcontrib><creatorcontrib>Ljung, Lennart</creatorcontrib><title>Linear Quadratic Control Using Model-Free Reinforcement Learning</title><title>IEEE transactions on automatic control</title><addtitle>TAC</addtitle><description><![CDATA[In this article, we consider linear quadratic (LQ) control problem with process and measurement noises. We analyze the LQ problem in terms of the average cost and the structure of the value function. We assume that the dynamics of the linear system is unknown and only noisy measurements of the state variable are available. Using noisy measurements of the state variable, we propose two model-free iterative algorithms to solve the LQ problem. The proposed algorithms are variants of policy iteration routine where the policy is greedy with respect to the average of all previous iterations. We rigorously analyze the properties of the proposed algorithms, including stability of the generated controllers and convergence. We analyze the effect of measurement noise on the performance of the proposed algorithms, the classical off-policy, and the classical <inline-formula><tex-math notation="LaTeX">Q</tex-math></inline-formula>-learning routines. We also investigate a model-building approach, inspired by adaptive control, where a model of the dynamical system is estimated and the optimal control problem is solved assuming that the estimated model is the true model. We use a benchmark to evaluate and compare our proposed algorithms with the classical off-policy, the classical <inline-formula><tex-math notation="LaTeX">Q</tex-math></inline-formula>-learning, and the policy gradient. We show that our model-building approach performs nearly identical to the analytical solution and our proposed policy iteration-based algorithms outperform the classical off-policy and the classical <inline-formula><tex-math notation="LaTeX">Q</tex-math></inline-formula>-learning algorithms on this benchmark but do not outperform the model-building approach.]]></description><subject>Adaptation models</subject><subject>Adaptive control</subject><subject>Algorithms</subject><subject>Benchmarks</subject><subject>Cost analysis</subject><subject>Costs</subject><subject>Dynamical systems</subject><subject>Exact solutions</subject><subject>Heuristic algorithms</subject><subject>Iterative algorithms</subject><subject>Iterative methods</subject><subject>Linear quadratic (LQ) control</subject><subject>Machine learning</subject><subject>Noise measurement</subject><subject>Optimal control</subject><subject>Process control</subject><subject>reinforcement learning (RL)</subject><subject>Stability analysis</subject><subject>State variable</subject><issn>0018-9286</issn><issn>1558-2523</issn><issn>1558-2523</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNo9kE1Lw0AQhhdRsFbvgpeA59T9TLI3S7UqRERpvS6bzaRsSbN1N0H8925J6WkYeN6XmQehW4JnhGD5sJovZhRTOmOEi4zRMzQhQhQpFZSdownGpEglLbJLdBXCNq4Z52SCHkvbgfbJ56Brr3trkoXreu_aZB1st0neXQ1tuvQAyRfYrnHewA66PiljqovENbpodBvg5jinaL18Xi1e0_Lj5W0xL1PDqOzTnBPCDSGgG4qz3MhMiqKp61wbIYEzSU0meAGMV5w2jWGaQlFJKaSsuMSSTVE69oZf2A-V2nu70_5POW3Vk_2eK-c3qrWDIpJKQSJ_P_J7734GCL3ausF38URF85wIRgtMI4VHyngXgofm1EuwOnhV0as6eFVHrzFyN0YsAJzw-A8pMGb_1ThyLQ</recordid><startdate>20230201</startdate><enddate>20230201</enddate><creator>Yaghmaie, Farnaz Adib</creator><creator>Gustafsson, Fredrik</creator><creator>Ljung, Lennart</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>ABXSW</scope><scope>ADTPV</scope><scope>AOWAS</scope><scope>D8T</scope><scope>DG8</scope><scope>ZZAVC</scope><orcidid>https://orcid.org/0000-0002-6665-5881</orcidid><orcidid>https://orcid.org/0000-0003-3270-171X</orcidid><orcidid>https://orcid.org/0000-0003-4881-8955</orcidid></search><sort><creationdate>20230201</creationdate><title>Linear Quadratic Control Using Model-Free Reinforcement Learning</title><author>Yaghmaie, Farnaz Adib ; Gustafsson, Fredrik ; Ljung, Lennart</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c329t-74114c11eaf2067c96958fdd7ac59e4392c6548e34b42ffc3a2e8b99599b49093</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Adaptation models</topic><topic>Adaptive control</topic><topic>Algorithms</topic><topic>Benchmarks</topic><topic>Cost analysis</topic><topic>Costs</topic><topic>Dynamical systems</topic><topic>Exact solutions</topic><topic>Heuristic algorithms</topic><topic>Iterative algorithms</topic><topic>Iterative methods</topic><topic>Linear quadratic (LQ) control</topic><topic>Machine learning</topic><topic>Noise measurement</topic><topic>Optimal control</topic><topic>Process control</topic><topic>reinforcement learning (RL)</topic><topic>Stability analysis</topic><topic>State variable</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yaghmaie, Farnaz Adib</creatorcontrib><creatorcontrib>Gustafsson, Fredrik</creatorcontrib><creatorcontrib>Ljung, Lennart</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>SWEPUB Linköpings universitet full text</collection><collection>SwePub</collection><collection>SwePub Articles</collection><collection>SWEPUB Freely available online</collection><collection>SWEPUB Linköpings universitet</collection><collection>SwePub Articles full text</collection><jtitle>IEEE transactions on automatic control</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yaghmaie, Farnaz Adib</au><au>Gustafsson, Fredrik</au><au>Ljung, Lennart</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Linear Quadratic Control Using Model-Free Reinforcement Learning</atitle><jtitle>IEEE transactions on automatic control</jtitle><stitle>TAC</stitle><date>2023-02-01</date><risdate>2023</risdate><volume>68</volume><issue>2</issue><spage>737</spage><epage>752</epage><pages>737-752</pages><issn>0018-9286</issn><issn>1558-2523</issn><eissn>1558-2523</eissn><coden>IETAA9</coden><abstract><![CDATA[In this article, we consider linear quadratic (LQ) control problem with process and measurement noises. We analyze the LQ problem in terms of the average cost and the structure of the value function. We assume that the dynamics of the linear system is unknown and only noisy measurements of the state variable are available. Using noisy measurements of the state variable, we propose two model-free iterative algorithms to solve the LQ problem. The proposed algorithms are variants of policy iteration routine where the policy is greedy with respect to the average of all previous iterations. We rigorously analyze the properties of the proposed algorithms, including stability of the generated controllers and convergence. We analyze the effect of measurement noise on the performance of the proposed algorithms, the classical off-policy, and the classical <inline-formula><tex-math notation="LaTeX">Q</tex-math></inline-formula>-learning routines. We also investigate a model-building approach, inspired by adaptive control, where a model of the dynamical system is estimated and the optimal control problem is solved assuming that the estimated model is the true model. We use a benchmark to evaluate and compare our proposed algorithms with the classical off-policy, the classical <inline-formula><tex-math notation="LaTeX">Q</tex-math></inline-formula>-learning, and the policy gradient. We show that our model-building approach performs nearly identical to the analytical solution and our proposed policy iteration-based algorithms outperform the classical off-policy and the classical <inline-formula><tex-math notation="LaTeX">Q</tex-math></inline-formula>-learning algorithms on this benchmark but do not outperform the model-building approach.]]></abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TAC.2022.3145632</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0002-6665-5881</orcidid><orcidid>https://orcid.org/0000-0003-3270-171X</orcidid><orcidid>https://orcid.org/0000-0003-4881-8955</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0018-9286
ispartof	IEEE transactions on automatic control, 2023-02, Vol.68 (2), p.737-752
issn	0018-9286 1558-2523 1558-2523
language	eng
recordid	cdi_proquest_journals_2771532802
source	IEEE Electronic Library (IEL) Journals
subjects	Adaptation models Adaptive control Algorithms Benchmarks Cost analysis Costs Dynamical systems Exact solutions Heuristic algorithms Iterative algorithms Iterative methods Linear quadratic (LQ) control Machine learning Noise measurement Optimal control Process control reinforcement learning (RL) Stability analysis State variable
title	Linear Quadratic Control Using Model-Free Reinforcement Learning
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T02%3A04%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Linear%20Quadratic%20Control%20Using%20Model-Free%20Reinforcement%20Learning&rft.jtitle=IEEE%20transactions%20on%20automatic%20control&rft.au=Yaghmaie,%20Farnaz%20Adib&rft.date=2023-02-01&rft.volume=68&rft.issue=2&rft.spage=737&rft.epage=752&rft.pages=737-752&rft.issn=0018-9286&rft.eissn=1558-2523&rft.coden=IETAA9&rft_id=info:doi/10.1109/TAC.2022.3145632&rft_dat=%3Cproquest_cross%3E2771532802%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c329t-74114c11eaf2067c96958fdd7ac59e4392c6548e34b42ffc3a2e8b99599b49093%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2771532802&rft_id=info:pmid/&rft_ieee_id=9691800&rfr_iscdi=true