Loading…

Generalization to New Sequential Decision Making Tasks with In-Context Learning

Training autonomous agents that can learn new tasks from only a handful of demonstrations is a long-standing problem in machine learning. Recently, transformers have been shown to learn new language or vision tasks without any weight updates from only a few examples, also referred to as in-context l...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2023-12
Main Authors:	Raparthy, Sharath Chandra, Hambro, Eric, Kirk, Robert, Henaff, Mikael, Raileanu, Roberta
Format:	Article
Language:	English
Subjects:	Context Datasets Decision making Machine learning Transformers
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Raparthy, Sharath Chandra Hambro, Eric Kirk, Robert Henaff, Mikael Raileanu, Roberta
description	Training autonomous agents that can learn new tasks from only a handful of demonstrations is a long-standing problem in machine learning. Recently, transformers have been shown to learn new language or vision tasks without any weight updates from only a few examples, also referred to as in-context learning. However, the sequential decision making setting poses additional challenges having a lower tolerance for errors since the environment's stochasticity or the agent's actions can lead to unseen, and sometimes unrecoverable, states. In this paper, we use an illustrative example to show that naively applying transformers to sequential decision making problems does not enable in-context learning of new tasks. We then demonstrate how training on sequences of trajectories with certain distributional properties leads to in-context learning of new sequential decision making tasks. We investigate different design choices and find that larger model and dataset sizes, as well as more task diversity, environment stochasticity, and trajectory burstiness, all result in better in-context learning of new out-of-distribution tasks. By training on large diverse offline datasets, our model is able to learn new MiniHack and Procgen tasks without any weight updates from just a handful of demonstrations.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2899514390</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2899514390</sourcerecordid><originalsourceid>FETCH-proquest_journals_28995143903</originalsourceid><addsrcrecordid>eNqNzEsKwjAUheEgCBbtHi44LsSk1XZcn-BjYOclyFXTlhtNUiqu3gouwNEZfId_wAIh5SxKYyFGLHSu4pyL-UIkiQzYaYOEVjX6rbw2BN7AETs447NF8lo1sMSLdl86qFrTDQrlaged9nfYUZQb8vjysEdlqecJG15V4zD87ZhN16si30YPa_qk82VlWks9lSLNsmQWy4zL_14fEFo-fA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2899514390</pqid></control><display><type>article</type><title>Generalization to New Sequential Decision Making Tasks with In-Context Learning</title><source>ProQuest - Publicly Available Content Database</source><creator>Raparthy, Sharath Chandra ; Hambro, Eric ; Kirk, Robert ; Henaff, Mikael ; Raileanu, Roberta</creator><creatorcontrib>Raparthy, Sharath Chandra ; Hambro, Eric ; Kirk, Robert ; Henaff, Mikael ; Raileanu, Roberta</creatorcontrib><description>Training autonomous agents that can learn new tasks from only a handful of demonstrations is a long-standing problem in machine learning. Recently, transformers have been shown to learn new language or vision tasks without any weight updates from only a few examples, also referred to as in-context learning. However, the sequential decision making setting poses additional challenges having a lower tolerance for errors since the environment's stochasticity or the agent's actions can lead to unseen, and sometimes unrecoverable, states. In this paper, we use an illustrative example to show that naively applying transformers to sequential decision making problems does not enable in-context learning of new tasks. We then demonstrate how training on sequences of trajectories with certain distributional properties leads to in-context learning of new sequential decision making tasks. We investigate different design choices and find that larger model and dataset sizes, as well as more task diversity, environment stochasticity, and trajectory burstiness, all result in better in-context learning of new out-of-distribution tasks. By training on large diverse offline datasets, our model is able to learn new MiniHack and Procgen tasks without any weight updates from just a handful of demonstrations.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Context ; Datasets ; Decision making ; Machine learning ; Transformers</subject><ispartof>arXiv.org, 2023-12</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2899514390?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Raparthy, Sharath Chandra</creatorcontrib><creatorcontrib>Hambro, Eric</creatorcontrib><creatorcontrib>Kirk, Robert</creatorcontrib><creatorcontrib>Henaff, Mikael</creatorcontrib><creatorcontrib>Raileanu, Roberta</creatorcontrib><title>Generalization to New Sequential Decision Making Tasks with In-Context Learning</title><title>arXiv.org</title><description>Training autonomous agents that can learn new tasks from only a handful of demonstrations is a long-standing problem in machine learning. Recently, transformers have been shown to learn new language or vision tasks without any weight updates from only a few examples, also referred to as in-context learning. However, the sequential decision making setting poses additional challenges having a lower tolerance for errors since the environment's stochasticity or the agent's actions can lead to unseen, and sometimes unrecoverable, states. In this paper, we use an illustrative example to show that naively applying transformers to sequential decision making problems does not enable in-context learning of new tasks. We then demonstrate how training on sequences of trajectories with certain distributional properties leads to in-context learning of new sequential decision making tasks. We investigate different design choices and find that larger model and dataset sizes, as well as more task diversity, environment stochasticity, and trajectory burstiness, all result in better in-context learning of new out-of-distribution tasks. By training on large diverse offline datasets, our model is able to learn new MiniHack and Procgen tasks without any weight updates from just a handful of demonstrations.</description><subject>Context</subject><subject>Datasets</subject><subject>Decision making</subject><subject>Machine learning</subject><subject>Transformers</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNzEsKwjAUheEgCBbtHi44LsSk1XZcn-BjYOclyFXTlhtNUiqu3gouwNEZfId_wAIh5SxKYyFGLHSu4pyL-UIkiQzYaYOEVjX6rbw2BN7AETs447NF8lo1sMSLdl86qFrTDQrlaged9nfYUZQb8vjysEdlqecJG15V4zD87ZhN16si30YPa_qk82VlWks9lSLNsmQWy4zL_14fEFo-fA</recordid><startdate>20231206</startdate><enddate>20231206</enddate><creator>Raparthy, Sharath Chandra</creator><creator>Hambro, Eric</creator><creator>Kirk, Robert</creator><creator>Henaff, Mikael</creator><creator>Raileanu, Roberta</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20231206</creationdate><title>Generalization to New Sequential Decision Making Tasks with In-Context Learning</title><author>Raparthy, Sharath Chandra ; Hambro, Eric ; Kirk, Robert ; Henaff, Mikael ; Raileanu, Roberta</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28995143903</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Context</topic><topic>Datasets</topic><topic>Decision making</topic><topic>Machine learning</topic><topic>Transformers</topic><toplevel>online_resources</toplevel><creatorcontrib>Raparthy, Sharath Chandra</creatorcontrib><creatorcontrib>Hambro, Eric</creatorcontrib><creatorcontrib>Kirk, Robert</creatorcontrib><creatorcontrib>Henaff, Mikael</creatorcontrib><creatorcontrib>Raileanu, Roberta</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>ProQuest - Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Raparthy, Sharath Chandra</au><au>Hambro, Eric</au><au>Kirk, Robert</au><au>Henaff, Mikael</au><au>Raileanu, Roberta</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Generalization to New Sequential Decision Making Tasks with In-Context Learning</atitle><jtitle>arXiv.org</jtitle><date>2023-12-06</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Training autonomous agents that can learn new tasks from only a handful of demonstrations is a long-standing problem in machine learning. Recently, transformers have been shown to learn new language or vision tasks without any weight updates from only a few examples, also referred to as in-context learning. However, the sequential decision making setting poses additional challenges having a lower tolerance for errors since the environment's stochasticity or the agent's actions can lead to unseen, and sometimes unrecoverable, states. In this paper, we use an illustrative example to show that naively applying transformers to sequential decision making problems does not enable in-context learning of new tasks. We then demonstrate how training on sequences of trajectories with certain distributional properties leads to in-context learning of new sequential decision making tasks. We investigate different design choices and find that larger model and dataset sizes, as well as more task diversity, environment stochasticity, and trajectory burstiness, all result in better in-context learning of new out-of-distribution tasks. By training on large diverse offline datasets, our model is able to learn new MiniHack and Procgen tasks without any weight updates from just a handful of demonstrations.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2023-12
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2899514390
source	ProQuest - Publicly Available Content Database
subjects	Context Datasets Decision making Machine learning Transformers
title	Generalization to New Sequential Decision Making Tasks with In-Context Learning
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T08%3A23%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Generalization%20to%20New%20Sequential%20Decision%20Making%20Tasks%20with%20In-Context%20Learning&rft.jtitle=arXiv.org&rft.au=Raparthy,%20Sharath%20Chandra&rft.date=2023-12-06&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2899514390%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_28995143903%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2899514390&rft_id=info:pmid/&rfr_iscdi=true