Loading…
Multi-Task Imitation Learning for Linear Dynamical Systems
We study representation learning for efficient imitation learning over linear systems. In particular, we consider a setting where learning is split into two phases: (a) a pre-training step where a shared \(k\)-dimensional representation is learned from \(H\) source policies, and (b) a target policy...
Saved in:
Published in: | arXiv.org 2023-11 |
---|---|
Main Authors: | , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Zhang, Thomas T Kang, Katie Lee, Bruce D Tomlin, Claire Levine, Sergey Tu, Stephen Matni, Nikolai |
description | We study representation learning for efficient imitation learning over linear systems. In particular, we consider a setting where learning is split into two phases: (a) a pre-training step where a shared \(k\)-dimensional representation is learned from \(H\) source policies, and (b) a target policy fine-tuning step where the learned representation is used to parameterize the policy class. We find that the imitation gap over trajectories generated by the learned target policy is bounded by \(\tilde{O}\left( \frac{k n_x}{HN_{\mathrm{shared}}} + \frac{k n_u}{N_{\mathrm{target}}}\right)\), where \(n_x > k\) is the state dimension, \(n_u\) is the input dimension, \(N_{\mathrm{shared}}\) denotes the total amount of data collected for each policy during representation learning, and \(N_{\mathrm{target}}\) is the amount of target task data. This result formalizes the intuition that aggregating data across related tasks to learn a representation can significantly improve the sample efficiency of learning a target task. The trends suggested by this bound are corroborated in simulation. |
format | article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2766567749</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2766567749</sourcerecordid><originalsourceid>FETCH-proquest_journals_27665677493</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSw8i3NKcnUDUkszlbwzM0sSSzJzM9T8ElNLMrLzEtXSMsvUvDJzANyFVwq8xJzM5MTcxSCK4tLUnOLeRhY0xJzilN5oTQ3g7Kba4izh25BUX5haWpxSXxWfmlRHlAq3sjczMzUzNzcxNKYOFUAqj42yg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2766567749</pqid></control><display><type>article</type><title>Multi-Task Imitation Learning for Linear Dynamical Systems</title><source>Publicly Available Content Database</source><creator>Zhang, Thomas T ; Kang, Katie ; Lee, Bruce D ; Tomlin, Claire ; Levine, Sergey ; Tu, Stephen ; Matni, Nikolai</creator><creatorcontrib>Zhang, Thomas T ; Kang, Katie ; Lee, Bruce D ; Tomlin, Claire ; Levine, Sergey ; Tu, Stephen ; Matni, Nikolai</creatorcontrib><description>We study representation learning for efficient imitation learning over linear systems. In particular, we consider a setting where learning is split into two phases: (a) a pre-training step where a shared \(k\)-dimensional representation is learned from \(H\) source policies, and (b) a target policy fine-tuning step where the learned representation is used to parameterize the policy class. We find that the imitation gap over trajectories generated by the learned target policy is bounded by \(\tilde{O}\left( \frac{k n_x}{HN_{\mathrm{shared}}} + \frac{k n_u}{N_{\mathrm{target}}}\right)\), where \(n_x > k\) is the state dimension, \(n_u\) is the input dimension, \(N_{\mathrm{shared}}\) denotes the total amount of data collected for each policy during representation learning, and \(N_{\mathrm{target}}\) is the amount of target task data. This result formalizes the intuition that aggregating data across related tasks to learn a representation can significantly improve the sample efficiency of learning a target task. The trends suggested by this bound are corroborated in simulation.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Dynamical systems ; Learning ; Linear systems ; Representations</subject><ispartof>arXiv.org, 2023-11</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2766567749?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25751,37010,44588</link.rule.ids></links><search><creatorcontrib>Zhang, Thomas T</creatorcontrib><creatorcontrib>Kang, Katie</creatorcontrib><creatorcontrib>Lee, Bruce D</creatorcontrib><creatorcontrib>Tomlin, Claire</creatorcontrib><creatorcontrib>Levine, Sergey</creatorcontrib><creatorcontrib>Tu, Stephen</creatorcontrib><creatorcontrib>Matni, Nikolai</creatorcontrib><title>Multi-Task Imitation Learning for Linear Dynamical Systems</title><title>arXiv.org</title><description>We study representation learning for efficient imitation learning over linear systems. In particular, we consider a setting where learning is split into two phases: (a) a pre-training step where a shared \(k\)-dimensional representation is learned from \(H\) source policies, and (b) a target policy fine-tuning step where the learned representation is used to parameterize the policy class. We find that the imitation gap over trajectories generated by the learned target policy is bounded by \(\tilde{O}\left( \frac{k n_x}{HN_{\mathrm{shared}}} + \frac{k n_u}{N_{\mathrm{target}}}\right)\), where \(n_x > k\) is the state dimension, \(n_u\) is the input dimension, \(N_{\mathrm{shared}}\) denotes the total amount of data collected for each policy during representation learning, and \(N_{\mathrm{target}}\) is the amount of target task data. This result formalizes the intuition that aggregating data across related tasks to learn a representation can significantly improve the sample efficiency of learning a target task. The trends suggested by this bound are corroborated in simulation.</description><subject>Dynamical systems</subject><subject>Learning</subject><subject>Linear systems</subject><subject>Representations</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mSw8i3NKcnUDUkszlbwzM0sSSzJzM9T8ElNLMrLzEtXSMsvUvDJzANyFVwq8xJzM5MTcxSCK4tLUnOLeRhY0xJzilN5oTQ3g7Kba4izh25BUX5haWpxSXxWfmlRHlAq3sjczMzUzNzcxNKYOFUAqj42yg</recordid><startdate>20231110</startdate><enddate>20231110</enddate><creator>Zhang, Thomas T</creator><creator>Kang, Katie</creator><creator>Lee, Bruce D</creator><creator>Tomlin, Claire</creator><creator>Levine, Sergey</creator><creator>Tu, Stephen</creator><creator>Matni, Nikolai</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20231110</creationdate><title>Multi-Task Imitation Learning for Linear Dynamical Systems</title><author>Zhang, Thomas T ; Kang, Katie ; Lee, Bruce D ; Tomlin, Claire ; Levine, Sergey ; Tu, Stephen ; Matni, Nikolai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_27665677493</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Dynamical systems</topic><topic>Learning</topic><topic>Linear systems</topic><topic>Representations</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Thomas T</creatorcontrib><creatorcontrib>Kang, Katie</creatorcontrib><creatorcontrib>Lee, Bruce D</creatorcontrib><creatorcontrib>Tomlin, Claire</creatorcontrib><creatorcontrib>Levine, Sergey</creatorcontrib><creatorcontrib>Tu, Stephen</creatorcontrib><creatorcontrib>Matni, Nikolai</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Thomas T</au><au>Kang, Katie</au><au>Lee, Bruce D</au><au>Tomlin, Claire</au><au>Levine, Sergey</au><au>Tu, Stephen</au><au>Matni, Nikolai</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Multi-Task Imitation Learning for Linear Dynamical Systems</atitle><jtitle>arXiv.org</jtitle><date>2023-11-10</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>We study representation learning for efficient imitation learning over linear systems. In particular, we consider a setting where learning is split into two phases: (a) a pre-training step where a shared \(k\)-dimensional representation is learned from \(H\) source policies, and (b) a target policy fine-tuning step where the learned representation is used to parameterize the policy class. We find that the imitation gap over trajectories generated by the learned target policy is bounded by \(\tilde{O}\left( \frac{k n_x}{HN_{\mathrm{shared}}} + \frac{k n_u}{N_{\mathrm{target}}}\right)\), where \(n_x > k\) is the state dimension, \(n_u\) is the input dimension, \(N_{\mathrm{shared}}\) denotes the total amount of data collected for each policy during representation learning, and \(N_{\mathrm{target}}\) is the amount of target task data. This result formalizes the intuition that aggregating data across related tasks to learn a representation can significantly improve the sample efficiency of learning a target task. The trends suggested by this bound are corroborated in simulation.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2023-11 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2766567749 |
source | Publicly Available Content Database |
subjects | Dynamical systems Learning Linear systems Representations |
title | Multi-Task Imitation Learning for Linear Dynamical Systems |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T14%3A19%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Multi-Task%20Imitation%20Learning%20for%20Linear%20Dynamical%20Systems&rft.jtitle=arXiv.org&rft.au=Zhang,%20Thomas%20T&rft.date=2023-11-10&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2766567749%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_27665677493%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2766567749&rft_id=info:pmid/&rfr_iscdi=true |