Loading…

Practice Makes Perfect: Planning to Learn Skill Parameter Policies

One promising approach towards effective robot decision making in complex, long-horizon tasks is to sequence together parameterized skills. We consider a setting where a robot is initially equipped with (1) a library of parameterized skills, (2) an AI planner for sequencing together the skills given...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2024-05
Main Authors:	Kumar, Nishanth, Silver, Tom, McClinton, Willie, Zhao, Linfeng, Proulx, Stephen, Lozano-Pérez, Tomás, Leslie Pack Kaelbling, Barry, Jennifer
Format:	Article
Language:	English
Subjects:	Parameterization Parameters Policies Robot control Robots Skills Task complexity
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Kumar, Nishanth Silver, Tom McClinton, Willie Zhao, Linfeng Proulx, Stephen Lozano-Pérez, Tomás Leslie Pack Kaelbling Barry, Jennifer
description	One promising approach towards effective robot decision making in complex, long-horizon tasks is to sequence together parameterized skills. We consider a setting where a robot is initially equipped with (1) a library of parameterized skills, (2) an AI planner for sequencing together the skills given a goal, and (3) a very general prior distribution for selecting skill parameters. Once deployed, the robot should rapidly and autonomously learn to improve its performance by specializing its skill parameter selection policy to the particular objects, goals, and constraints in its environment. In this work, we focus on the active learning problem of choosing which skills to practice to maximize expected future task success. We propose that the robot should estimate the competence of each skill, extrapolate the competence (asking: "how much would the competence improve through practice?"), and situate the skill in the task distribution through competence-aware planning. This approach is implemented within a fully autonomous system where the robot repeatedly plans, practices, and learns without any environment resets. Through experiments in simulation, we find that our approach learns effective parameter policies more sample-efficiently than several baselines. Experiments in the real-world demonstrate our approach's ability to handle noise from perception and control and improve the robot's ability to solve two long-horizon mobile-manipulation tasks after a few hours of autonomous practice. Project website: http://ees.csail.mit.edu
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2931833851</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2931833851</sourcerecordid><originalsourceid>FETCH-proquest_journals_29318338513</originalsourceid><addsrcrecordid>eNqNyrEKwjAQgOEgCBbtOxw4F9qc1eqoKA4KAd1LCFdJGxO9pO-vgw_g9A_fPxGZRKyKZiXlTOQx9mVZyvVG1jVmYq9Ym2QNwVUPFEERd2TSDpTT3lv_gBTgQpo93AbrHCjN-kmJGFRw1liKCzHttIuU_zoXy9PxfjgXLw7vkWJq-zCy_1Irt1g1iE1d4X_XB11zOSA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2931833851</pqid></control><display><type>article</type><title>Practice Makes Perfect: Planning to Learn Skill Parameter Policies</title><source>Publicly Available Content Database</source><creator>Kumar, Nishanth ; Silver, Tom ; McClinton, Willie ; Zhao, Linfeng ; Proulx, Stephen ; Lozano-Pérez, Tomás ; Leslie Pack Kaelbling ; Barry, Jennifer</creator><creatorcontrib>Kumar, Nishanth ; Silver, Tom ; McClinton, Willie ; Zhao, Linfeng ; Proulx, Stephen ; Lozano-Pérez, Tomás ; Leslie Pack Kaelbling ; Barry, Jennifer</creatorcontrib><description>One promising approach towards effective robot decision making in complex, long-horizon tasks is to sequence together parameterized skills. We consider a setting where a robot is initially equipped with (1) a library of parameterized skills, (2) an AI planner for sequencing together the skills given a goal, and (3) a very general prior distribution for selecting skill parameters. Once deployed, the robot should rapidly and autonomously learn to improve its performance by specializing its skill parameter selection policy to the particular objects, goals, and constraints in its environment. In this work, we focus on the active learning problem of choosing which skills to practice to maximize expected future task success. We propose that the robot should estimate the competence of each skill, extrapolate the competence (asking: "how much would the competence improve through practice?"), and situate the skill in the task distribution through competence-aware planning. This approach is implemented within a fully autonomous system where the robot repeatedly plans, practices, and learns without any environment resets. Through experiments in simulation, we find that our approach learns effective parameter policies more sample-efficiently than several baselines. Experiments in the real-world demonstrate our approach's ability to handle noise from perception and control and improve the robot's ability to solve two long-horizon mobile-manipulation tasks after a few hours of autonomous practice. Project website: http://ees.csail.mit.edu</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Parameterization ; Parameters ; Policies ; Robot control ; Robots ; Skills ; Task complexity</subject><ispartof>arXiv.org, 2024-05</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2931833851?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Kumar, Nishanth</creatorcontrib><creatorcontrib>Silver, Tom</creatorcontrib><creatorcontrib>McClinton, Willie</creatorcontrib><creatorcontrib>Zhao, Linfeng</creatorcontrib><creatorcontrib>Proulx, Stephen</creatorcontrib><creatorcontrib>Lozano-Pérez, Tomás</creatorcontrib><creatorcontrib>Leslie Pack Kaelbling</creatorcontrib><creatorcontrib>Barry, Jennifer</creatorcontrib><title>Practice Makes Perfect: Planning to Learn Skill Parameter Policies</title><title>arXiv.org</title><description>One promising approach towards effective robot decision making in complex, long-horizon tasks is to sequence together parameterized skills. We consider a setting where a robot is initially equipped with (1) a library of parameterized skills, (2) an AI planner for sequencing together the skills given a goal, and (3) a very general prior distribution for selecting skill parameters. Once deployed, the robot should rapidly and autonomously learn to improve its performance by specializing its skill parameter selection policy to the particular objects, goals, and constraints in its environment. In this work, we focus on the active learning problem of choosing which skills to practice to maximize expected future task success. We propose that the robot should estimate the competence of each skill, extrapolate the competence (asking: "how much would the competence improve through practice?"), and situate the skill in the task distribution through competence-aware planning. This approach is implemented within a fully autonomous system where the robot repeatedly plans, practices, and learns without any environment resets. Through experiments in simulation, we find that our approach learns effective parameter policies more sample-efficiently than several baselines. Experiments in the real-world demonstrate our approach's ability to handle noise from perception and control and improve the robot's ability to solve two long-horizon mobile-manipulation tasks after a few hours of autonomous practice. Project website: http://ees.csail.mit.edu</description><subject>Parameterization</subject><subject>Parameters</subject><subject>Policies</subject><subject>Robot control</subject><subject>Robots</subject><subject>Skills</subject><subject>Task complexity</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNyrEKwjAQgOEgCBbtOxw4F9qc1eqoKA4KAd1LCFdJGxO9pO-vgw_g9A_fPxGZRKyKZiXlTOQx9mVZyvVG1jVmYq9Ym2QNwVUPFEERd2TSDpTT3lv_gBTgQpo93AbrHCjN-kmJGFRw1liKCzHttIuU_zoXy9PxfjgXLw7vkWJq-zCy_1Irt1g1iE1d4X_XB11zOSA</recordid><startdate>20240518</startdate><enddate>20240518</enddate><creator>Kumar, Nishanth</creator><creator>Silver, Tom</creator><creator>McClinton, Willie</creator><creator>Zhao, Linfeng</creator><creator>Proulx, Stephen</creator><creator>Lozano-Pérez, Tomás</creator><creator>Leslie Pack Kaelbling</creator><creator>Barry, Jennifer</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240518</creationdate><title>Practice Makes Perfect: Planning to Learn Skill Parameter Policies</title><author>Kumar, Nishanth ; Silver, Tom ; McClinton, Willie ; Zhao, Linfeng ; Proulx, Stephen ; Lozano-Pérez, Tomás ; Leslie Pack Kaelbling ; Barry, Jennifer</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29318338513</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Parameterization</topic><topic>Parameters</topic><topic>Policies</topic><topic>Robot control</topic><topic>Robots</topic><topic>Skills</topic><topic>Task complexity</topic><toplevel>online_resources</toplevel><creatorcontrib>Kumar, Nishanth</creatorcontrib><creatorcontrib>Silver, Tom</creatorcontrib><creatorcontrib>McClinton, Willie</creatorcontrib><creatorcontrib>Zhao, Linfeng</creatorcontrib><creatorcontrib>Proulx, Stephen</creatorcontrib><creatorcontrib>Lozano-Pérez, Tomás</creatorcontrib><creatorcontrib>Leslie Pack Kaelbling</creatorcontrib><creatorcontrib>Barry, Jennifer</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kumar, Nishanth</au><au>Silver, Tom</au><au>McClinton, Willie</au><au>Zhao, Linfeng</au><au>Proulx, Stephen</au><au>Lozano-Pérez, Tomás</au><au>Leslie Pack Kaelbling</au><au>Barry, Jennifer</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Practice Makes Perfect: Planning to Learn Skill Parameter Policies</atitle><jtitle>arXiv.org</jtitle><date>2024-05-18</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>One promising approach towards effective robot decision making in complex, long-horizon tasks is to sequence together parameterized skills. We consider a setting where a robot is initially equipped with (1) a library of parameterized skills, (2) an AI planner for sequencing together the skills given a goal, and (3) a very general prior distribution for selecting skill parameters. Once deployed, the robot should rapidly and autonomously learn to improve its performance by specializing its skill parameter selection policy to the particular objects, goals, and constraints in its environment. In this work, we focus on the active learning problem of choosing which skills to practice to maximize expected future task success. We propose that the robot should estimate the competence of each skill, extrapolate the competence (asking: "how much would the competence improve through practice?"), and situate the skill in the task distribution through competence-aware planning. This approach is implemented within a fully autonomous system where the robot repeatedly plans, practices, and learns without any environment resets. Through experiments in simulation, we find that our approach learns effective parameter policies more sample-efficiently than several baselines. Experiments in the real-world demonstrate our approach's ability to handle noise from perception and control and improve the robot's ability to solve two long-horizon mobile-manipulation tasks after a few hours of autonomous practice. Project website: http://ees.csail.mit.edu</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-05
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2931833851
source	Publicly Available Content Database
subjects	Parameterization Parameters Policies Robot control Robots Skills Task complexity
title	Practice Makes Perfect: Planning to Learn Skill Parameter Policies
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T03%3A28%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Practice%20Makes%20Perfect:%20Planning%20to%20Learn%20Skill%20Parameter%20Policies&rft.jtitle=arXiv.org&rft.au=Kumar,%20Nishanth&rft.date=2024-05-18&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2931833851%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_29318338513%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2931833851&rft_id=info:pmid/&rfr_iscdi=true