Loading…

Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning

Hierarchical reinforcement learning (RL) can accelerate long-horizon decision-making by temporally abstracting a policy into multiple levels. Promising results in sparse reward environments have been seen with skills , i.e. sequences of primitive actions. Typically, a skill latent space and policy a...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE robotics and automation letters 2024-04, Vol.9 (4), p.3625-3632
Main Authors:	Hao, Ce, Weaver, Catherine, Tang, Chen, Kawamoto, Kenta, Tomizuka, Masayoshi, Zhan, Wei
Format:	Article
Language:	English
Subjects:	Algorithms Decoding Optimization Policies Regularization Reinforcement learning representation learning Skills Sports Task analysis Training transfer learning
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites	cdi_FETCH-LOGICAL-c287t-8fcfe79deb070772c9744847540da0d02f81c85bf2ee5b9d65415f672c49df113
container_end_page	3632
container_issue	4
container_start_page	3625
container_title	IEEE robotics and automation letters
container_volume	9
creator	Hao, Ce Weaver, Catherine Tang, Chen Kawamoto, Kenta Tomizuka, Masayoshi Zhan, Wei
description	Hierarchical reinforcement learning (RL) can accelerate long-horizon decision-making by temporally abstracting a policy into multiple levels. Promising results in sparse reward environments have been seen with skills , i.e. sequences of primitive actions. Typically, a skill latent space and policy are discovered from offline data. However, the resulting low-level policy can be unreliable due to low-coverage demonstrations or distribution shifts. As a solution, we propose the Skill-Critic algorithm to fine-tune the low-level policy in conjunction with high-level skill selection. Our Skill-Critic algorithm optimizes both the low-level and high-level policies; these policies are initialized and regularized by the latent space learned from offline demonstrations to guide the parallel policy optimization. We validate Skill-Critic in multiple sparse-reward RL environments, including a new sparse-reward autonomous racing task in Gran Turismo Sport. The experiments show that Skill-Critic's low-level policy fine-tuning and demonstration-guided regularization are essential for good performance.
doi_str_mv	10.1109/LRA.2024.3368231
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_LRA_2024_3368231</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10443024</ieee_id><sourcerecordid>2938023102</sourcerecordid><originalsourceid>FETCH-LOGICAL-c287t-8fcfe79deb070772c9744847540da0d02f81c85bf2ee5b9d65415f672c49df113</originalsourceid><addsrcrecordid>eNpNkD1PwzAQhi0EElXpzsAQiTnFn7HNVlVAkYKQCsxW4pzBJU2KnQ78e1zSocPpTvbz3kkPQtcEzwnB-q5cL-YUUz5nrFCUkTM0oUzKnMmiOD-ZL9Esxg3GmAgqmRYT9PL27ds2XwY_eHufrcH5znefWQlV6KDJ_r9j5vqQrTyEKtgvb6s2gb5Ljxa20A0jnWJX6MJVbYTZsU_Rx-PD-3KVl69Pz8tFmVuq5JArZx1I3UCNJZaSWi05V1wKjpsKN5g6RawStaMAotZNITgRrkgg140jhE3R7bh3F_qfPcTBbPp96NJJQzVTOClINUV4pGzoYwzgzC74bRV-DcHm4M0kb-bgzRy9pcjNGPEAcIJzzhLG_gDTjmge</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2938023102</pqid></control><display><type>article</type><title>Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Hao, Ce ; Weaver, Catherine ; Tang, Chen ; Kawamoto, Kenta ; Tomizuka, Masayoshi ; Zhan, Wei</creator><creatorcontrib>Hao, Ce ; Weaver, Catherine ; Tang, Chen ; Kawamoto, Kenta ; Tomizuka, Masayoshi ; Zhan, Wei</creatorcontrib><description>Hierarchical reinforcement learning (RL) can accelerate long-horizon decision-making by temporally abstracting a policy into multiple levels. Promising results in sparse reward environments have been seen with skills , i.e. sequences of primitive actions. Typically, a skill latent space and policy are discovered from offline data. However, the resulting low-level policy can be unreliable due to low-coverage demonstrations or distribution shifts. As a solution, we propose the Skill-Critic algorithm to fine-tune the low-level policy in conjunction with high-level skill selection. Our Skill-Critic algorithm optimizes both the low-level and high-level policies; these policies are initialized and regularized by the latent space learned from offline demonstrations to guide the parallel policy optimization. We validate Skill-Critic in multiple sparse-reward RL environments, including a new sparse-reward autonomous racing task in Gran Turismo Sport. The experiments show that Skill-Critic's low-level policy fine-tuning and demonstration-guided regularization are essential for good performance.</description><identifier>ISSN: 2377-3766</identifier><identifier>EISSN: 2377-3766</identifier><identifier>DOI: 10.1109/LRA.2024.3368231</identifier><identifier>CODEN: IRALC6</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Algorithms ; Decoding ; Optimization ; Policies ; Regularization ; Reinforcement learning ; representation learning ; Skills ; Sports ; Task analysis ; Training ; transfer learning</subject><ispartof>IEEE robotics and automation letters, 2024-04, Vol.9 (4), p.3625-3632</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c287t-8fcfe79deb070772c9744847540da0d02f81c85bf2ee5b9d65415f672c49df113</cites><orcidid>0000-0003-0206-6639 ; 0000-0002-8521-9305 ; 0000-0002-7536-9983 ; 0000-0001-7276-2766 ; 0009-0000-7653-9713 ; 0000-0002-1474-1200</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10443024$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Hao, Ce</creatorcontrib><creatorcontrib>Weaver, Catherine</creatorcontrib><creatorcontrib>Tang, Chen</creatorcontrib><creatorcontrib>Kawamoto, Kenta</creatorcontrib><creatorcontrib>Tomizuka, Masayoshi</creatorcontrib><creatorcontrib>Zhan, Wei</creatorcontrib><title>Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning</title><title>IEEE robotics and automation letters</title><addtitle>LRA</addtitle><description>Hierarchical reinforcement learning (RL) can accelerate long-horizon decision-making by temporally abstracting a policy into multiple levels. Promising results in sparse reward environments have been seen with skills , i.e. sequences of primitive actions. Typically, a skill latent space and policy are discovered from offline data. However, the resulting low-level policy can be unreliable due to low-coverage demonstrations or distribution shifts. As a solution, we propose the Skill-Critic algorithm to fine-tune the low-level policy in conjunction with high-level skill selection. Our Skill-Critic algorithm optimizes both the low-level and high-level policies; these policies are initialized and regularized by the latent space learned from offline demonstrations to guide the parallel policy optimization. We validate Skill-Critic in multiple sparse-reward RL environments, including a new sparse-reward autonomous racing task in Gran Turismo Sport. The experiments show that Skill-Critic's low-level policy fine-tuning and demonstration-guided regularization are essential for good performance.</description><subject>Algorithms</subject><subject>Decoding</subject><subject>Optimization</subject><subject>Policies</subject><subject>Regularization</subject><subject>Reinforcement learning</subject><subject>representation learning</subject><subject>Skills</subject><subject>Sports</subject><subject>Task analysis</subject><subject>Training</subject><subject>transfer learning</subject><issn>2377-3766</issn><issn>2377-3766</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkD1PwzAQhi0EElXpzsAQiTnFn7HNVlVAkYKQCsxW4pzBJU2KnQ78e1zSocPpTvbz3kkPQtcEzwnB-q5cL-YUUz5nrFCUkTM0oUzKnMmiOD-ZL9Esxg3GmAgqmRYT9PL27ds2XwY_eHufrcH5znefWQlV6KDJ_r9j5vqQrTyEKtgvb6s2gb5Ljxa20A0jnWJX6MJVbYTZsU_Rx-PD-3KVl69Pz8tFmVuq5JArZx1I3UCNJZaSWi05V1wKjpsKN5g6RawStaMAotZNITgRrkgg140jhE3R7bh3F_qfPcTBbPp96NJJQzVTOClINUV4pGzoYwzgzC74bRV-DcHm4M0kb-bgzRy9pcjNGPEAcIJzzhLG_gDTjmge</recordid><startdate>20240401</startdate><enddate>20240401</enddate><creator>Hao, Ce</creator><creator>Weaver, Catherine</creator><creator>Tang, Chen</creator><creator>Kawamoto, Kenta</creator><creator>Tomizuka, Masayoshi</creator><creator>Zhan, Wei</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-0206-6639</orcidid><orcidid>https://orcid.org/0000-0002-8521-9305</orcidid><orcidid>https://orcid.org/0000-0002-7536-9983</orcidid><orcidid>https://orcid.org/0000-0001-7276-2766</orcidid><orcidid>https://orcid.org/0009-0000-7653-9713</orcidid><orcidid>https://orcid.org/0000-0002-1474-1200</orcidid></search><sort><creationdate>20240401</creationdate><title>Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning</title><author>Hao, Ce ; Weaver, Catherine ; Tang, Chen ; Kawamoto, Kenta ; Tomizuka, Masayoshi ; Zhan, Wei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c287t-8fcfe79deb070772c9744847540da0d02f81c85bf2ee5b9d65415f672c49df113</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Decoding</topic><topic>Optimization</topic><topic>Policies</topic><topic>Regularization</topic><topic>Reinforcement learning</topic><topic>representation learning</topic><topic>Skills</topic><topic>Sports</topic><topic>Task analysis</topic><topic>Training</topic><topic>transfer learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hao, Ce</creatorcontrib><creatorcontrib>Weaver, Catherine</creatorcontrib><creatorcontrib>Tang, Chen</creatorcontrib><creatorcontrib>Kawamoto, Kenta</creatorcontrib><creatorcontrib>Tomizuka, Masayoshi</creatorcontrib><creatorcontrib>Zhan, Wei</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE robotics and automation letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hao, Ce</au><au>Weaver, Catherine</au><au>Tang, Chen</au><au>Kawamoto, Kenta</au><au>Tomizuka, Masayoshi</au><au>Zhan, Wei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning</atitle><jtitle>IEEE robotics and automation letters</jtitle><stitle>LRA</stitle><date>2024-04-01</date><risdate>2024</risdate><volume>9</volume><issue>4</issue><spage>3625</spage><epage>3632</epage><pages>3625-3632</pages><issn>2377-3766</issn><eissn>2377-3766</eissn><coden>IRALC6</coden><abstract>Hierarchical reinforcement learning (RL) can accelerate long-horizon decision-making by temporally abstracting a policy into multiple levels. Promising results in sparse reward environments have been seen with skills , i.e. sequences of primitive actions. Typically, a skill latent space and policy are discovered from offline data. However, the resulting low-level policy can be unreliable due to low-coverage demonstrations or distribution shifts. As a solution, we propose the Skill-Critic algorithm to fine-tune the low-level policy in conjunction with high-level skill selection. Our Skill-Critic algorithm optimizes both the low-level and high-level policies; these policies are initialized and regularized by the latent space learned from offline demonstrations to guide the parallel policy optimization. We validate Skill-Critic in multiple sparse-reward RL environments, including a new sparse-reward autonomous racing task in Gran Turismo Sport. The experiments show that Skill-Critic's low-level policy fine-tuning and demonstration-guided regularization are essential for good performance.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/LRA.2024.3368231</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0003-0206-6639</orcidid><orcidid>https://orcid.org/0000-0002-8521-9305</orcidid><orcidid>https://orcid.org/0000-0002-7536-9983</orcidid><orcidid>https://orcid.org/0000-0001-7276-2766</orcidid><orcidid>https://orcid.org/0009-0000-7653-9713</orcidid><orcidid>https://orcid.org/0000-0002-1474-1200</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2377-3766
ispartof	IEEE robotics and automation letters, 2024-04, Vol.9 (4), p.3625-3632
issn	2377-3766 2377-3766
language	eng
recordid	cdi_crossref_primary_10_1109_LRA_2024_3368231
source	IEEE Electronic Library (IEL) Journals
subjects	Algorithms Decoding Optimization Policies Regularization Reinforcement learning representation learning Skills Sports Task analysis Training transfer learning
title	Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T04%3A47%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Skill-Critic:%20Refining%20Learned%20Skills%20for%20Hierarchical%20Reinforcement%20Learning&rft.jtitle=IEEE%20robotics%20and%20automation%20letters&rft.au=Hao,%20Ce&rft.date=2024-04-01&rft.volume=9&rft.issue=4&rft.spage=3625&rft.epage=3632&rft.pages=3625-3632&rft.issn=2377-3766&rft.eissn=2377-3766&rft.coden=IRALC6&rft_id=info:doi/10.1109/LRA.2024.3368231&rft_dat=%3Cproquest_cross%3E2938023102%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c287t-8fcfe79deb070772c9744847540da0d02f81c85bf2ee5b9d65415f672c49df113%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2938023102&rft_id=info:pmid/&rft_ieee_id=10443024&rfr_iscdi=true