Loading…
Asymmetric self-play for automatic goal discovery in robotic manipulation
We train a single, goal-conditioned policy that can solve many robotic manipulation tasks, including tasks with previously unseen goals and objects. We rely on asymmetric self-play for goal discovery, where two agents, Alice and Bob, play a game. Alice is asked to propose challenging goals and Bob a...
Saved in:
Published in: | arXiv.org 2021-01 |
---|---|
Main Authors: | , , , , , , , , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | OpenAI, OpenAI Plappert, Matthias Sampedro, Raul Xu, Tao Akkaya, Ilge Kosaraju, Vineet Welinder, Peter D'Sa, Ruben Petron, Arthur Henrique P d O Pinto Paino, Alex Noh, Hyeonwoo Weng, Lilian Yuan, Qiming Chu, Casey Zaremba, Wojciech |
description | We train a single, goal-conditioned policy that can solve many robotic manipulation tasks, including tasks with previously unseen goals and objects. We rely on asymmetric self-play for goal discovery, where two agents, Alice and Bob, play a game. Alice is asked to propose challenging goals and Bob aims to solve them. We show that this method can discover highly diverse and complex goals without any human priors. Bob can be trained with only sparse rewards, because the interaction between Alice and Bob results in a natural curriculum and Bob can learn from Alice's trajectory when relabeled as a goal-conditioned demonstration. Finally, our method scales, resulting in a single policy that can generalize to many unseen tasks such as setting a table, stacking blocks, and solving simple puzzles. Videos of a learned policy is available at https://robotics-self-play.github.io. |
format | article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2477834727</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2477834727</sourcerecordid><originalsourceid>FETCH-proquest_journals_24778347273</originalsourceid><addsrcrecordid>eNqNikEKwjAQAIMgWLR_CHgu1E1rehVR9O69xJpKSpKt2UTo763gAzwNzMyCZSDErmgqgBXLiYayLGEvoa5Fxq4HmpzTMZiOk7Z9MVo18R4DVymiU3H2T1SWPwx1-NZh4sbzgHf8Fqe8GZOdL_QbtuyVJZ3_uGbb8-l2vBRjwFfSFNsBU_BzaqGSshGVBCn-uz7myz1C</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2477834727</pqid></control><display><type>article</type><title>Asymmetric self-play for automatic goal discovery in robotic manipulation</title><source>Publicly Available Content Database</source><creator>OpenAI, OpenAI ; Plappert, Matthias ; Sampedro, Raul ; Xu, Tao ; Akkaya, Ilge ; Kosaraju, Vineet ; Welinder, Peter ; D'Sa, Ruben ; Petron, Arthur ; Henrique P d O Pinto ; Paino, Alex ; Noh, Hyeonwoo ; Weng, Lilian ; Yuan, Qiming ; Chu, Casey ; Zaremba, Wojciech</creator><creatorcontrib>OpenAI, OpenAI ; Plappert, Matthias ; Sampedro, Raul ; Xu, Tao ; Akkaya, Ilge ; Kosaraju, Vineet ; Welinder, Peter ; D'Sa, Ruben ; Petron, Arthur ; Henrique P d O Pinto ; Paino, Alex ; Noh, Hyeonwoo ; Weng, Lilian ; Yuan, Qiming ; Chu, Casey ; Zaremba, Wojciech</creatorcontrib><description>We train a single, goal-conditioned policy that can solve many robotic manipulation tasks, including tasks with previously unseen goals and objects. We rely on asymmetric self-play for goal discovery, where two agents, Alice and Bob, play a game. Alice is asked to propose challenging goals and Bob aims to solve them. We show that this method can discover highly diverse and complex goals without any human priors. Bob can be trained with only sparse rewards, because the interaction between Alice and Bob results in a natural curriculum and Bob can learn from Alice's trajectory when relabeled as a goal-conditioned demonstration. Finally, our method scales, resulting in a single policy that can generalize to many unseen tasks such as setting a table, stacking blocks, and solving simple puzzles. Videos of a learned policy is available at https://robotics-self-play.github.io.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Asymmetry ; Curricula ; Robotics</subject><ispartof>arXiv.org, 2021-01</ispartof><rights>2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2477834727?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>OpenAI, OpenAI</creatorcontrib><creatorcontrib>Plappert, Matthias</creatorcontrib><creatorcontrib>Sampedro, Raul</creatorcontrib><creatorcontrib>Xu, Tao</creatorcontrib><creatorcontrib>Akkaya, Ilge</creatorcontrib><creatorcontrib>Kosaraju, Vineet</creatorcontrib><creatorcontrib>Welinder, Peter</creatorcontrib><creatorcontrib>D'Sa, Ruben</creatorcontrib><creatorcontrib>Petron, Arthur</creatorcontrib><creatorcontrib>Henrique P d O Pinto</creatorcontrib><creatorcontrib>Paino, Alex</creatorcontrib><creatorcontrib>Noh, Hyeonwoo</creatorcontrib><creatorcontrib>Weng, Lilian</creatorcontrib><creatorcontrib>Yuan, Qiming</creatorcontrib><creatorcontrib>Chu, Casey</creatorcontrib><creatorcontrib>Zaremba, Wojciech</creatorcontrib><title>Asymmetric self-play for automatic goal discovery in robotic manipulation</title><title>arXiv.org</title><description>We train a single, goal-conditioned policy that can solve many robotic manipulation tasks, including tasks with previously unseen goals and objects. We rely on asymmetric self-play for goal discovery, where two agents, Alice and Bob, play a game. Alice is asked to propose challenging goals and Bob aims to solve them. We show that this method can discover highly diverse and complex goals without any human priors. Bob can be trained with only sparse rewards, because the interaction between Alice and Bob results in a natural curriculum and Bob can learn from Alice's trajectory when relabeled as a goal-conditioned demonstration. Finally, our method scales, resulting in a single policy that can generalize to many unseen tasks such as setting a table, stacking blocks, and solving simple puzzles. Videos of a learned policy is available at https://robotics-self-play.github.io.</description><subject>Asymmetry</subject><subject>Curricula</subject><subject>Robotics</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNikEKwjAQAIMgWLR_CHgu1E1rehVR9O69xJpKSpKt2UTo763gAzwNzMyCZSDErmgqgBXLiYayLGEvoa5Fxq4HmpzTMZiOk7Z9MVo18R4DVymiU3H2T1SWPwx1-NZh4sbzgHf8Fqe8GZOdL_QbtuyVJZ3_uGbb8-l2vBRjwFfSFNsBU_BzaqGSshGVBCn-uz7myz1C</recordid><startdate>20210113</startdate><enddate>20210113</enddate><creator>OpenAI, OpenAI</creator><creator>Plappert, Matthias</creator><creator>Sampedro, Raul</creator><creator>Xu, Tao</creator><creator>Akkaya, Ilge</creator><creator>Kosaraju, Vineet</creator><creator>Welinder, Peter</creator><creator>D'Sa, Ruben</creator><creator>Petron, Arthur</creator><creator>Henrique P d O Pinto</creator><creator>Paino, Alex</creator><creator>Noh, Hyeonwoo</creator><creator>Weng, Lilian</creator><creator>Yuan, Qiming</creator><creator>Chu, Casey</creator><creator>Zaremba, Wojciech</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20210113</creationdate><title>Asymmetric self-play for automatic goal discovery in robotic manipulation</title><author>OpenAI, OpenAI ; Plappert, Matthias ; Sampedro, Raul ; Xu, Tao ; Akkaya, Ilge ; Kosaraju, Vineet ; Welinder, Peter ; D'Sa, Ruben ; Petron, Arthur ; Henrique P d O Pinto ; Paino, Alex ; Noh, Hyeonwoo ; Weng, Lilian ; Yuan, Qiming ; Chu, Casey ; Zaremba, Wojciech</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_24778347273</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Asymmetry</topic><topic>Curricula</topic><topic>Robotics</topic><toplevel>online_resources</toplevel><creatorcontrib>OpenAI, OpenAI</creatorcontrib><creatorcontrib>Plappert, Matthias</creatorcontrib><creatorcontrib>Sampedro, Raul</creatorcontrib><creatorcontrib>Xu, Tao</creatorcontrib><creatorcontrib>Akkaya, Ilge</creatorcontrib><creatorcontrib>Kosaraju, Vineet</creatorcontrib><creatorcontrib>Welinder, Peter</creatorcontrib><creatorcontrib>D'Sa, Ruben</creatorcontrib><creatorcontrib>Petron, Arthur</creatorcontrib><creatorcontrib>Henrique P d O Pinto</creatorcontrib><creatorcontrib>Paino, Alex</creatorcontrib><creatorcontrib>Noh, Hyeonwoo</creatorcontrib><creatorcontrib>Weng, Lilian</creatorcontrib><creatorcontrib>Yuan, Qiming</creatorcontrib><creatorcontrib>Chu, Casey</creatorcontrib><creatorcontrib>Zaremba, Wojciech</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>OpenAI, OpenAI</au><au>Plappert, Matthias</au><au>Sampedro, Raul</au><au>Xu, Tao</au><au>Akkaya, Ilge</au><au>Kosaraju, Vineet</au><au>Welinder, Peter</au><au>D'Sa, Ruben</au><au>Petron, Arthur</au><au>Henrique P d O Pinto</au><au>Paino, Alex</au><au>Noh, Hyeonwoo</au><au>Weng, Lilian</au><au>Yuan, Qiming</au><au>Chu, Casey</au><au>Zaremba, Wojciech</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Asymmetric self-play for automatic goal discovery in robotic manipulation</atitle><jtitle>arXiv.org</jtitle><date>2021-01-13</date><risdate>2021</risdate><eissn>2331-8422</eissn><abstract>We train a single, goal-conditioned policy that can solve many robotic manipulation tasks, including tasks with previously unseen goals and objects. We rely on asymmetric self-play for goal discovery, where two agents, Alice and Bob, play a game. Alice is asked to propose challenging goals and Bob aims to solve them. We show that this method can discover highly diverse and complex goals without any human priors. Bob can be trained with only sparse rewards, because the interaction between Alice and Bob results in a natural curriculum and Bob can learn from Alice's trajectory when relabeled as a goal-conditioned demonstration. Finally, our method scales, resulting in a single policy that can generalize to many unseen tasks such as setting a table, stacking blocks, and solving simple puzzles. Videos of a learned policy is available at https://robotics-self-play.github.io.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2021-01 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2477834727 |
source | Publicly Available Content Database |
subjects | Asymmetry Curricula Robotics |
title | Asymmetric self-play for automatic goal discovery in robotic manipulation |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T15%3A20%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Asymmetric%20self-play%20for%20automatic%20goal%20discovery%20in%20robotic%20manipulation&rft.jtitle=arXiv.org&rft.au=OpenAI,%20OpenAI&rft.date=2021-01-13&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2477834727%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_24778347273%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2477834727&rft_id=info:pmid/&rfr_iscdi=true |