Loading…

Asymmetric self-play for automatic goal discovery in robotic manipulation

We train a single, goal-conditioned policy that can solve many robotic manipulation tasks, including tasks with previously unseen goals and objects. We rely on asymmetric self-play for goal discovery, where two agents, Alice and Bob, play a game. Alice is asked to propose challenging goals and Bob a...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2021-01
Main Authors: OpenAI, OpenAI, Plappert, Matthias, Sampedro, Raul, Xu, Tao, Akkaya, Ilge, Kosaraju, Vineet, Welinder, Peter, D'Sa, Ruben, Petron, Arthur, Henrique P d O Pinto, Paino, Alex, Noh, Hyeonwoo, Weng, Lilian, Yuan, Qiming, Chu, Casey, Zaremba, Wojciech
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator OpenAI, OpenAI
Plappert, Matthias
Sampedro, Raul
Xu, Tao
Akkaya, Ilge
Kosaraju, Vineet
Welinder, Peter
D'Sa, Ruben
Petron, Arthur
Henrique P d O Pinto
Paino, Alex
Noh, Hyeonwoo
Weng, Lilian
Yuan, Qiming
Chu, Casey
Zaremba, Wojciech
description We train a single, goal-conditioned policy that can solve many robotic manipulation tasks, including tasks with previously unseen goals and objects. We rely on asymmetric self-play for goal discovery, where two agents, Alice and Bob, play a game. Alice is asked to propose challenging goals and Bob aims to solve them. We show that this method can discover highly diverse and complex goals without any human priors. Bob can be trained with only sparse rewards, because the interaction between Alice and Bob results in a natural curriculum and Bob can learn from Alice's trajectory when relabeled as a goal-conditioned demonstration. Finally, our method scales, resulting in a single policy that can generalize to many unseen tasks such as setting a table, stacking blocks, and solving simple puzzles. Videos of a learned policy is available at https://robotics-self-play.github.io.
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2477834727</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2477834727</sourcerecordid><originalsourceid>FETCH-proquest_journals_24778347273</originalsourceid><addsrcrecordid>eNqNikEKwjAQAIMgWLR_CHgu1E1rehVR9O69xJpKSpKt2UTo763gAzwNzMyCZSDErmgqgBXLiYayLGEvoa5Fxq4HmpzTMZiOk7Z9MVo18R4DVymiU3H2T1SWPwx1-NZh4sbzgHf8Fqe8GZOdL_QbtuyVJZ3_uGbb8-l2vBRjwFfSFNsBU_BzaqGSshGVBCn-uz7myz1C</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2477834727</pqid></control><display><type>article</type><title>Asymmetric self-play for automatic goal discovery in robotic manipulation</title><source>Publicly Available Content Database</source><creator>OpenAI, OpenAI ; Plappert, Matthias ; Sampedro, Raul ; Xu, Tao ; Akkaya, Ilge ; Kosaraju, Vineet ; Welinder, Peter ; D'Sa, Ruben ; Petron, Arthur ; Henrique P d O Pinto ; Paino, Alex ; Noh, Hyeonwoo ; Weng, Lilian ; Yuan, Qiming ; Chu, Casey ; Zaremba, Wojciech</creator><creatorcontrib>OpenAI, OpenAI ; Plappert, Matthias ; Sampedro, Raul ; Xu, Tao ; Akkaya, Ilge ; Kosaraju, Vineet ; Welinder, Peter ; D'Sa, Ruben ; Petron, Arthur ; Henrique P d O Pinto ; Paino, Alex ; Noh, Hyeonwoo ; Weng, Lilian ; Yuan, Qiming ; Chu, Casey ; Zaremba, Wojciech</creatorcontrib><description>We train a single, goal-conditioned policy that can solve many robotic manipulation tasks, including tasks with previously unseen goals and objects. We rely on asymmetric self-play for goal discovery, where two agents, Alice and Bob, play a game. Alice is asked to propose challenging goals and Bob aims to solve them. We show that this method can discover highly diverse and complex goals without any human priors. Bob can be trained with only sparse rewards, because the interaction between Alice and Bob results in a natural curriculum and Bob can learn from Alice's trajectory when relabeled as a goal-conditioned demonstration. Finally, our method scales, resulting in a single policy that can generalize to many unseen tasks such as setting a table, stacking blocks, and solving simple puzzles. Videos of a learned policy is available at https://robotics-self-play.github.io.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Asymmetry ; Curricula ; Robotics</subject><ispartof>arXiv.org, 2021-01</ispartof><rights>2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2477834727?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>OpenAI, OpenAI</creatorcontrib><creatorcontrib>Plappert, Matthias</creatorcontrib><creatorcontrib>Sampedro, Raul</creatorcontrib><creatorcontrib>Xu, Tao</creatorcontrib><creatorcontrib>Akkaya, Ilge</creatorcontrib><creatorcontrib>Kosaraju, Vineet</creatorcontrib><creatorcontrib>Welinder, Peter</creatorcontrib><creatorcontrib>D'Sa, Ruben</creatorcontrib><creatorcontrib>Petron, Arthur</creatorcontrib><creatorcontrib>Henrique P d O Pinto</creatorcontrib><creatorcontrib>Paino, Alex</creatorcontrib><creatorcontrib>Noh, Hyeonwoo</creatorcontrib><creatorcontrib>Weng, Lilian</creatorcontrib><creatorcontrib>Yuan, Qiming</creatorcontrib><creatorcontrib>Chu, Casey</creatorcontrib><creatorcontrib>Zaremba, Wojciech</creatorcontrib><title>Asymmetric self-play for automatic goal discovery in robotic manipulation</title><title>arXiv.org</title><description>We train a single, goal-conditioned policy that can solve many robotic manipulation tasks, including tasks with previously unseen goals and objects. We rely on asymmetric self-play for goal discovery, where two agents, Alice and Bob, play a game. Alice is asked to propose challenging goals and Bob aims to solve them. We show that this method can discover highly diverse and complex goals without any human priors. Bob can be trained with only sparse rewards, because the interaction between Alice and Bob results in a natural curriculum and Bob can learn from Alice's trajectory when relabeled as a goal-conditioned demonstration. Finally, our method scales, resulting in a single policy that can generalize to many unseen tasks such as setting a table, stacking blocks, and solving simple puzzles. Videos of a learned policy is available at https://robotics-self-play.github.io.</description><subject>Asymmetry</subject><subject>Curricula</subject><subject>Robotics</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNikEKwjAQAIMgWLR_CHgu1E1rehVR9O69xJpKSpKt2UTo763gAzwNzMyCZSDErmgqgBXLiYayLGEvoa5Fxq4HmpzTMZiOk7Z9MVo18R4DVymiU3H2T1SWPwx1-NZh4sbzgHf8Fqe8GZOdL_QbtuyVJZ3_uGbb8-l2vBRjwFfSFNsBU_BzaqGSshGVBCn-uz7myz1C</recordid><startdate>20210113</startdate><enddate>20210113</enddate><creator>OpenAI, OpenAI</creator><creator>Plappert, Matthias</creator><creator>Sampedro, Raul</creator><creator>Xu, Tao</creator><creator>Akkaya, Ilge</creator><creator>Kosaraju, Vineet</creator><creator>Welinder, Peter</creator><creator>D'Sa, Ruben</creator><creator>Petron, Arthur</creator><creator>Henrique P d O Pinto</creator><creator>Paino, Alex</creator><creator>Noh, Hyeonwoo</creator><creator>Weng, Lilian</creator><creator>Yuan, Qiming</creator><creator>Chu, Casey</creator><creator>Zaremba, Wojciech</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20210113</creationdate><title>Asymmetric self-play for automatic goal discovery in robotic manipulation</title><author>OpenAI, OpenAI ; Plappert, Matthias ; Sampedro, Raul ; Xu, Tao ; Akkaya, Ilge ; Kosaraju, Vineet ; Welinder, Peter ; D'Sa, Ruben ; Petron, Arthur ; Henrique P d O Pinto ; Paino, Alex ; Noh, Hyeonwoo ; Weng, Lilian ; Yuan, Qiming ; Chu, Casey ; Zaremba, Wojciech</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_24778347273</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Asymmetry</topic><topic>Curricula</topic><topic>Robotics</topic><toplevel>online_resources</toplevel><creatorcontrib>OpenAI, OpenAI</creatorcontrib><creatorcontrib>Plappert, Matthias</creatorcontrib><creatorcontrib>Sampedro, Raul</creatorcontrib><creatorcontrib>Xu, Tao</creatorcontrib><creatorcontrib>Akkaya, Ilge</creatorcontrib><creatorcontrib>Kosaraju, Vineet</creatorcontrib><creatorcontrib>Welinder, Peter</creatorcontrib><creatorcontrib>D'Sa, Ruben</creatorcontrib><creatorcontrib>Petron, Arthur</creatorcontrib><creatorcontrib>Henrique P d O Pinto</creatorcontrib><creatorcontrib>Paino, Alex</creatorcontrib><creatorcontrib>Noh, Hyeonwoo</creatorcontrib><creatorcontrib>Weng, Lilian</creatorcontrib><creatorcontrib>Yuan, Qiming</creatorcontrib><creatorcontrib>Chu, Casey</creatorcontrib><creatorcontrib>Zaremba, Wojciech</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>OpenAI, OpenAI</au><au>Plappert, Matthias</au><au>Sampedro, Raul</au><au>Xu, Tao</au><au>Akkaya, Ilge</au><au>Kosaraju, Vineet</au><au>Welinder, Peter</au><au>D'Sa, Ruben</au><au>Petron, Arthur</au><au>Henrique P d O Pinto</au><au>Paino, Alex</au><au>Noh, Hyeonwoo</au><au>Weng, Lilian</au><au>Yuan, Qiming</au><au>Chu, Casey</au><au>Zaremba, Wojciech</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Asymmetric self-play for automatic goal discovery in robotic manipulation</atitle><jtitle>arXiv.org</jtitle><date>2021-01-13</date><risdate>2021</risdate><eissn>2331-8422</eissn><abstract>We train a single, goal-conditioned policy that can solve many robotic manipulation tasks, including tasks with previously unseen goals and objects. We rely on asymmetric self-play for goal discovery, where two agents, Alice and Bob, play a game. Alice is asked to propose challenging goals and Bob aims to solve them. We show that this method can discover highly diverse and complex goals without any human priors. Bob can be trained with only sparse rewards, because the interaction between Alice and Bob results in a natural curriculum and Bob can learn from Alice's trajectory when relabeled as a goal-conditioned demonstration. Finally, our method scales, resulting in a single policy that can generalize to many unseen tasks such as setting a table, stacking blocks, and solving simple puzzles. Videos of a learned policy is available at https://robotics-self-play.github.io.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2021-01
issn 2331-8422
language eng
recordid cdi_proquest_journals_2477834727
source Publicly Available Content Database
subjects Asymmetry
Curricula
Robotics
title Asymmetric self-play for automatic goal discovery in robotic manipulation
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T15%3A20%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Asymmetric%20self-play%20for%20automatic%20goal%20discovery%20in%20robotic%20manipulation&rft.jtitle=arXiv.org&rft.au=OpenAI,%20OpenAI&rft.date=2021-01-13&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2477834727%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_24778347273%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2477834727&rft_id=info:pmid/&rfr_iscdi=true