Loading…

Asymmetric self-play for automatic goal discovery in robotic manipulation

We train a single, goal-conditioned policy that can solve many robotic manipulation tasks, including tasks with previously unseen goals and objects. We rely on asymmetric self-play for goal discovery, where two agents, Alice and Bob, play a game. Alice is asked to propose challenging goals and Bob a...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2021-01
Main Authors:	OpenAI, OpenAI, Plappert, Matthias, Sampedro, Raul, Xu, Tao, Akkaya, Ilge, Kosaraju, Vineet, Welinder, Peter, D'Sa, Ruben, Petron, Arthur, Henrique P d O Pinto, Paino, Alex, Noh, Hyeonwoo, Weng, Lilian, Yuan, Qiming, Chu, Casey, Zaremba, Wojciech
Format:	Article
Language:	English
Subjects:	Asymmetry Curricula Robotics
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	OpenAI, OpenAI Plappert, Matthias Sampedro, Raul Xu, Tao Akkaya, Ilge Kosaraju, Vineet Welinder, Peter D'Sa, Ruben Petron, Arthur Henrique P d O Pinto Paino, Alex Noh, Hyeonwoo Weng, Lilian Yuan, Qiming Chu, Casey Zaremba, Wojciech
description	We train a single, goal-conditioned policy that can solve many robotic manipulation tasks, including tasks with previously unseen goals and objects. We rely on asymmetric self-play for goal discovery, where two agents, Alice and Bob, play a game. Alice is asked to propose challenging goals and Bob aims to solve them. We show that this method can discover highly diverse and complex goals without any human priors. Bob can be trained with only sparse rewards, because the interaction between Alice and Bob results in a natural curriculum and Bob can learn from Alice's trajectory when relabeled as a goal-conditioned demonstration. Finally, our method scales, resulting in a single policy that can generalize to many unseen tasks such as setting a table, stacking blocks, and solving simple puzzles. Videos of a learned policy is available at https://robotics-self-play.github.io.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2477834727</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2477834727</sourcerecordid><originalsourceid>FETCH-proquest_journals_24778347273</originalsourceid><addsrcrecordid>eNqNikEKwjAQAIMgWLR_CHgu1E1rehVR9O69xJpKSpKt2UTo763gAzwNzMyCZSDErmgqgBXLiYayLGEvoa5Fxq4HmpzTMZiOk7Z9MVo18R4DVymiU3H2T1SWPwx1-NZh4sbzgHf8Fqe8GZOdL_QbtuyVJZ3_uGbb8-l2vBRjwFfSFNsBU_BzaqGSshGVBCn-uz7myz1C</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2477834727</pqid></control><display><type>article</type><title>Asymmetric self-play for automatic goal discovery in robotic manipulation</title><source>Publicly Available Content Database</source><creator>OpenAI, OpenAI ; Plappert, Matthias ; Sampedro, Raul ; Xu, Tao ; Akkaya, Ilge ; Kosaraju, Vineet ; Welinder, Peter ; D'Sa, Ruben ; Petron, Arthur ; Henrique P d O Pinto ; Paino, Alex ; Noh, Hyeonwoo ; Weng, Lilian ; Yuan, Qiming ; Chu, Casey ; Zaremba, Wojciech</creator><creatorcontrib>OpenAI, OpenAI ; Plappert, Matthias ; Sampedro, Raul ; Xu, Tao ; Akkaya, Ilge ; Kosaraju, Vineet ; Welinder, Peter ; D'Sa, Ruben ; Petron, Arthur ; Henrique P d O Pinto ; Paino, Alex ; Noh, Hyeonwoo ; Weng, Lilian ; Yuan, Qiming ; Chu, Casey ; Zaremba, Wojciech</creatorcontrib><description>We train a single, goal-conditioned policy that can solve many robotic manipulation tasks, including tasks with previously unseen goals and objects. We rely on asymmetric self-play for goal discovery, where two agents, Alice and Bob, play a game. Alice is asked to propose challenging goals and Bob aims to solve them. We show that this method can discover highly diverse and complex goals without any human priors. Bob can be trained with only sparse rewards, because the interaction between Alice and Bob results in a natural curriculum and Bob can learn from Alice's trajectory when relabeled as a goal-conditioned demonstration. Finally, our method scales, resulting in a single policy that can generalize to many unseen tasks such as setting a table, stacking blocks, and solving simple puzzles. Videos of a learned policy is available at https://robotics-self-play.github.io.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Asymmetry ; Curricula ; Robotics</subject><ispartof>arXiv.org, 2021-01</ispartof><rights>2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2477834727?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>OpenAI, OpenAI</creatorcontrib><creatorcontrib>Plappert, Matthias</creatorcontrib><creatorcontrib>Sampedro, Raul</creatorcontrib><creatorcontrib>Xu, Tao</creatorcontrib><creatorcontrib>Akkaya, Ilge</creatorcontrib><creatorcontrib>Kosaraju, Vineet</creatorcontrib><creatorcontrib>Welinder, Peter</creatorcontrib><creatorcontrib>D'Sa, Ruben</creatorcontrib><creatorcontrib>Petron, Arthur</creatorcontrib><creatorcontrib>Henrique P d O Pinto</creatorcontrib><creatorcontrib>Paino, Alex</creatorcontrib><creatorcontrib>Noh, Hyeonwoo</creatorcontrib><creatorcontrib>Weng, Lilian</creatorcontrib><creatorcontrib>Yuan, Qiming</creatorcontrib><creatorcontrib>Chu, Casey</creatorcontrib><creatorcontrib>Zaremba, Wojciech</creatorcontrib><title>Asymmetric self-play for automatic goal discovery in robotic manipulation</title><title>arXiv.org</title><description>We train a single, goal-conditioned policy that can solve many robotic manipulation tasks, including tasks with previously unseen goals and objects. We rely on asymmetric self-play for goal discovery, where two agents, Alice and Bob, play a game. Alice is asked to propose challenging goals and Bob aims to solve them. We show that this method can discover highly diverse and complex goals without any human priors. Bob can be trained with only sparse rewards, because the interaction between Alice and Bob results in a natural curriculum and Bob can learn from Alice's trajectory when relabeled as a goal-conditioned demonstration. Finally, our method scales, resulting in a single policy that can generalize to many unseen tasks such as setting a table, stacking blocks, and solving simple puzzles. Videos of a learned policy is available at https://robotics-self-play.github.io.</description><subject>Asymmetry</subject><subject>Curricula</subject><subject>Robotics</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNikEKwjAQAIMgWLR_CHgu1E1rehVR9O69xJpKSpKt2UTo763gAzwNzMyCZSDErmgqgBXLiYayLGEvoa5Fxq4HmpzTMZiOk7Z9MVo18R4DVymiU3H2T1SWPwx1-NZh4sbzgHf8Fqe8GZOdL_QbtuyVJZ3_uGbb8-l2vBRjwFfSFNsBU_BzaqGSshGVBCn-uz7myz1C</recordid><startdate>20210113</startdate><enddate>20210113</enddate><creator>OpenAI, OpenAI</creator><creator>Plappert, Matthias</creator><creator>Sampedro, Raul</creator><creator>Xu, Tao</creator><creator>Akkaya, Ilge</creator><creator>Kosaraju, Vineet</creator><creator>Welinder, Peter</creator><creator>D'Sa, Ruben</creator><creator>Petron, Arthur</creator><creator>Henrique P d O Pinto</creator><creator>Paino, Alex</creator><creator>Noh, Hyeonwoo</creator><creator>Weng, Lilian</creator><creator>Yuan, Qiming</creator><creator>Chu, Casey</creator><creator>Zaremba, Wojciech</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20210113</creationdate><title>Asymmetric self-play for automatic goal discovery in robotic manipulation</title><author>OpenAI, OpenAI ; Plappert, Matthias ; Sampedro, Raul ; Xu, Tao ; Akkaya, Ilge ; Kosaraju, Vineet ; Welinder, Peter ; D'Sa, Ruben ; Petron, Arthur ; Henrique P d O Pinto ; Paino, Alex ; Noh, Hyeonwoo ; Weng, Lilian ; Yuan, Qiming ; Chu, Casey ; Zaremba, Wojciech</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_24778347273</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Asymmetry</topic><topic>Curricula</topic><topic>Robotics</topic><toplevel>online_resources</toplevel><creatorcontrib>OpenAI, OpenAI</creatorcontrib><creatorcontrib>Plappert, Matthias</creatorcontrib><creatorcontrib>Sampedro, Raul</creatorcontrib><creatorcontrib>Xu, Tao</creatorcontrib><creatorcontrib>Akkaya, Ilge</creatorcontrib><creatorcontrib>Kosaraju, Vineet</creatorcontrib><creatorcontrib>Welinder, Peter</creatorcontrib><creatorcontrib>D'Sa, Ruben</creatorcontrib><creatorcontrib>Petron, Arthur</creatorcontrib><creatorcontrib>Henrique P d O Pinto</creatorcontrib><creatorcontrib>Paino, Alex</creatorcontrib><creatorcontrib>Noh, Hyeonwoo</creatorcontrib><creatorcontrib>Weng, Lilian</creatorcontrib><creatorcontrib>Yuan, Qiming</creatorcontrib><creatorcontrib>Chu, Casey</creatorcontrib><creatorcontrib>Zaremba, Wojciech</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>OpenAI, OpenAI</au><au>Plappert, Matthias</au><au>Sampedro, Raul</au><au>Xu, Tao</au><au>Akkaya, Ilge</au><au>Kosaraju, Vineet</au><au>Welinder, Peter</au><au>D'Sa, Ruben</au><au>Petron, Arthur</au><au>Henrique P d O Pinto</au><au>Paino, Alex</au><au>Noh, Hyeonwoo</au><au>Weng, Lilian</au><au>Yuan, Qiming</au><au>Chu, Casey</au><au>Zaremba, Wojciech</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Asymmetric self-play for automatic goal discovery in robotic manipulation</atitle><jtitle>arXiv.org</jtitle><date>2021-01-13</date><risdate>2021</risdate><eissn>2331-8422</eissn><abstract>We train a single, goal-conditioned policy that can solve many robotic manipulation tasks, including tasks with previously unseen goals and objects. We rely on asymmetric self-play for goal discovery, where two agents, Alice and Bob, play a game. Alice is asked to propose challenging goals and Bob aims to solve them. We show that this method can discover highly diverse and complex goals without any human priors. Bob can be trained with only sparse rewards, because the interaction between Alice and Bob results in a natural curriculum and Bob can learn from Alice's trajectory when relabeled as a goal-conditioned demonstration. Finally, our method scales, resulting in a single policy that can generalize to many unseen tasks such as setting a table, stacking blocks, and solving simple puzzles. Videos of a learned policy is available at https://robotics-self-play.github.io.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2021-01
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2477834727
source	Publicly Available Content Database
subjects	Asymmetry Curricula Robotics
title	Asymmetric self-play for automatic goal discovery in robotic manipulation
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T15%3A20%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Asymmetric%20self-play%20for%20automatic%20goal%20discovery%20in%20robotic%20manipulation&rft.jtitle=arXiv.org&rft.au=OpenAI,%20OpenAI&rft.date=2021-01-13&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2477834727%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_24778347273%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2477834727&rft_id=info:pmid/&rfr_iscdi=true