Loading…
Practice Makes Perfect: Planning to Learn Skill Parameter Policies
One promising approach towards effective robot decision making in complex, long-horizon tasks is to sequence together parameterized skills. We consider a setting where a robot is initially equipped with (1) a library of parameterized skills, (2) an AI planner for sequencing together the skills given...
Saved in:
Published in: | arXiv.org 2024-05 |
---|---|
Main Authors: | , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Kumar, Nishanth Silver, Tom McClinton, Willie Zhao, Linfeng Proulx, Stephen Lozano-Pérez, Tomás Leslie Pack Kaelbling Barry, Jennifer |
description | One promising approach towards effective robot decision making in complex, long-horizon tasks is to sequence together parameterized skills. We consider a setting where a robot is initially equipped with (1) a library of parameterized skills, (2) an AI planner for sequencing together the skills given a goal, and (3) a very general prior distribution for selecting skill parameters. Once deployed, the robot should rapidly and autonomously learn to improve its performance by specializing its skill parameter selection policy to the particular objects, goals, and constraints in its environment. In this work, we focus on the active learning problem of choosing which skills to practice to maximize expected future task success. We propose that the robot should estimate the competence of each skill, extrapolate the competence (asking: "how much would the competence improve through practice?"), and situate the skill in the task distribution through competence-aware planning. This approach is implemented within a fully autonomous system where the robot repeatedly plans, practices, and learns without any environment resets. Through experiments in simulation, we find that our approach learns effective parameter policies more sample-efficiently than several baselines. Experiments in the real-world demonstrate our approach's ability to handle noise from perception and control and improve the robot's ability to solve two long-horizon mobile-manipulation tasks after a few hours of autonomous practice. Project website: http://ees.csail.mit.edu |
format | article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2931833851</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2931833851</sourcerecordid><originalsourceid>FETCH-proquest_journals_29318338513</originalsourceid><addsrcrecordid>eNqNyrEKwjAQgOEgCBbtOxw4F9qc1eqoKA4KAd1LCFdJGxO9pO-vgw_g9A_fPxGZRKyKZiXlTOQx9mVZyvVG1jVmYq9Ym2QNwVUPFEERd2TSDpTT3lv_gBTgQpo93AbrHCjN-kmJGFRw1liKCzHttIuU_zoXy9PxfjgXLw7vkWJq-zCy_1Irt1g1iE1d4X_XB11zOSA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2931833851</pqid></control><display><type>article</type><title>Practice Makes Perfect: Planning to Learn Skill Parameter Policies</title><source>Publicly Available Content Database</source><creator>Kumar, Nishanth ; Silver, Tom ; McClinton, Willie ; Zhao, Linfeng ; Proulx, Stephen ; Lozano-Pérez, Tomás ; Leslie Pack Kaelbling ; Barry, Jennifer</creator><creatorcontrib>Kumar, Nishanth ; Silver, Tom ; McClinton, Willie ; Zhao, Linfeng ; Proulx, Stephen ; Lozano-Pérez, Tomás ; Leslie Pack Kaelbling ; Barry, Jennifer</creatorcontrib><description>One promising approach towards effective robot decision making in complex, long-horizon tasks is to sequence together parameterized skills. We consider a setting where a robot is initially equipped with (1) a library of parameterized skills, (2) an AI planner for sequencing together the skills given a goal, and (3) a very general prior distribution for selecting skill parameters. Once deployed, the robot should rapidly and autonomously learn to improve its performance by specializing its skill parameter selection policy to the particular objects, goals, and constraints in its environment. In this work, we focus on the active learning problem of choosing which skills to practice to maximize expected future task success. We propose that the robot should estimate the competence of each skill, extrapolate the competence (asking: "how much would the competence improve through practice?"), and situate the skill in the task distribution through competence-aware planning. This approach is implemented within a fully autonomous system where the robot repeatedly plans, practices, and learns without any environment resets. Through experiments in simulation, we find that our approach learns effective parameter policies more sample-efficiently than several baselines. Experiments in the real-world demonstrate our approach's ability to handle noise from perception and control and improve the robot's ability to solve two long-horizon mobile-manipulation tasks after a few hours of autonomous practice. Project website: http://ees.csail.mit.edu</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Parameterization ; Parameters ; Policies ; Robot control ; Robots ; Skills ; Task complexity</subject><ispartof>arXiv.org, 2024-05</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2931833851?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Kumar, Nishanth</creatorcontrib><creatorcontrib>Silver, Tom</creatorcontrib><creatorcontrib>McClinton, Willie</creatorcontrib><creatorcontrib>Zhao, Linfeng</creatorcontrib><creatorcontrib>Proulx, Stephen</creatorcontrib><creatorcontrib>Lozano-Pérez, Tomás</creatorcontrib><creatorcontrib>Leslie Pack Kaelbling</creatorcontrib><creatorcontrib>Barry, Jennifer</creatorcontrib><title>Practice Makes Perfect: Planning to Learn Skill Parameter Policies</title><title>arXiv.org</title><description>One promising approach towards effective robot decision making in complex, long-horizon tasks is to sequence together parameterized skills. We consider a setting where a robot is initially equipped with (1) a library of parameterized skills, (2) an AI planner for sequencing together the skills given a goal, and (3) a very general prior distribution for selecting skill parameters. Once deployed, the robot should rapidly and autonomously learn to improve its performance by specializing its skill parameter selection policy to the particular objects, goals, and constraints in its environment. In this work, we focus on the active learning problem of choosing which skills to practice to maximize expected future task success. We propose that the robot should estimate the competence of each skill, extrapolate the competence (asking: "how much would the competence improve through practice?"), and situate the skill in the task distribution through competence-aware planning. This approach is implemented within a fully autonomous system where the robot repeatedly plans, practices, and learns without any environment resets. Through experiments in simulation, we find that our approach learns effective parameter policies more sample-efficiently than several baselines. Experiments in the real-world demonstrate our approach's ability to handle noise from perception and control and improve the robot's ability to solve two long-horizon mobile-manipulation tasks after a few hours of autonomous practice. Project website: http://ees.csail.mit.edu</description><subject>Parameterization</subject><subject>Parameters</subject><subject>Policies</subject><subject>Robot control</subject><subject>Robots</subject><subject>Skills</subject><subject>Task complexity</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNyrEKwjAQgOEgCBbtOxw4F9qc1eqoKA4KAd1LCFdJGxO9pO-vgw_g9A_fPxGZRKyKZiXlTOQx9mVZyvVG1jVmYq9Ym2QNwVUPFEERd2TSDpTT3lv_gBTgQpo93AbrHCjN-kmJGFRw1liKCzHttIuU_zoXy9PxfjgXLw7vkWJq-zCy_1Irt1g1iE1d4X_XB11zOSA</recordid><startdate>20240518</startdate><enddate>20240518</enddate><creator>Kumar, Nishanth</creator><creator>Silver, Tom</creator><creator>McClinton, Willie</creator><creator>Zhao, Linfeng</creator><creator>Proulx, Stephen</creator><creator>Lozano-Pérez, Tomás</creator><creator>Leslie Pack Kaelbling</creator><creator>Barry, Jennifer</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240518</creationdate><title>Practice Makes Perfect: Planning to Learn Skill Parameter Policies</title><author>Kumar, Nishanth ; Silver, Tom ; McClinton, Willie ; Zhao, Linfeng ; Proulx, Stephen ; Lozano-Pérez, Tomás ; Leslie Pack Kaelbling ; Barry, Jennifer</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29318338513</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Parameterization</topic><topic>Parameters</topic><topic>Policies</topic><topic>Robot control</topic><topic>Robots</topic><topic>Skills</topic><topic>Task complexity</topic><toplevel>online_resources</toplevel><creatorcontrib>Kumar, Nishanth</creatorcontrib><creatorcontrib>Silver, Tom</creatorcontrib><creatorcontrib>McClinton, Willie</creatorcontrib><creatorcontrib>Zhao, Linfeng</creatorcontrib><creatorcontrib>Proulx, Stephen</creatorcontrib><creatorcontrib>Lozano-Pérez, Tomás</creatorcontrib><creatorcontrib>Leslie Pack Kaelbling</creatorcontrib><creatorcontrib>Barry, Jennifer</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kumar, Nishanth</au><au>Silver, Tom</au><au>McClinton, Willie</au><au>Zhao, Linfeng</au><au>Proulx, Stephen</au><au>Lozano-Pérez, Tomás</au><au>Leslie Pack Kaelbling</au><au>Barry, Jennifer</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Practice Makes Perfect: Planning to Learn Skill Parameter Policies</atitle><jtitle>arXiv.org</jtitle><date>2024-05-18</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>One promising approach towards effective robot decision making in complex, long-horizon tasks is to sequence together parameterized skills. We consider a setting where a robot is initially equipped with (1) a library of parameterized skills, (2) an AI planner for sequencing together the skills given a goal, and (3) a very general prior distribution for selecting skill parameters. Once deployed, the robot should rapidly and autonomously learn to improve its performance by specializing its skill parameter selection policy to the particular objects, goals, and constraints in its environment. In this work, we focus on the active learning problem of choosing which skills to practice to maximize expected future task success. We propose that the robot should estimate the competence of each skill, extrapolate the competence (asking: "how much would the competence improve through practice?"), and situate the skill in the task distribution through competence-aware planning. This approach is implemented within a fully autonomous system where the robot repeatedly plans, practices, and learns without any environment resets. Through experiments in simulation, we find that our approach learns effective parameter policies more sample-efficiently than several baselines. Experiments in the real-world demonstrate our approach's ability to handle noise from perception and control and improve the robot's ability to solve two long-horizon mobile-manipulation tasks after a few hours of autonomous practice. Project website: http://ees.csail.mit.edu</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-05 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2931833851 |
source | Publicly Available Content Database |
subjects | Parameterization Parameters Policies Robot control Robots Skills Task complexity |
title | Practice Makes Perfect: Planning to Learn Skill Parameter Policies |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T03%3A28%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Practice%20Makes%20Perfect:%20Planning%20to%20Learn%20Skill%20Parameter%20Policies&rft.jtitle=arXiv.org&rft.au=Kumar,%20Nishanth&rft.date=2024-05-18&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2931833851%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_29318338513%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2931833851&rft_id=info:pmid/&rfr_iscdi=true |