Loading…
DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching
In offline reinforcement learning (RL), the performance of the learned policy highly depends on the quality of offline datasets. However, in many cases, the offline dataset contains very limited optimal trajectories, which poses a challenge for offline RL algorithms as agents must acquire the abilit...
Saved in:
Published in: | arXiv.org 2024-02 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Li, Guanghe Shan, Yixiang Zhu, Zhengbang Long, Ting Zhang, Weinan |
description | In offline reinforcement learning (RL), the performance of the learned policy highly depends on the quality of offline datasets. However, in many cases, the offline dataset contains very limited optimal trajectories, which poses a challenge for offline RL algorithms as agents must acquire the ability to transit to high-reward regions. To address this issue, we introduce Diffusion-based Trajectory Stitching (DiffStitch), a novel diffusion-based data augmentation pipeline that systematically generates stitching transitions between trajectories. DiffStitch effectively connects low-reward trajectories with high-reward trajectories, forming globally optimal trajectories to address the challenges faced by offline RL algorithms. Empirical experiments conducted on D4RL datasets demonstrate the effectiveness of DiffStitch across RL methodologies. Notably, DiffStitch demonstrates substantial enhancements in the performance of one-step methods (IQL), imitation learning methods (TD3+BC), and trajectory optimization methods (DT). |
format | article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2922658986</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2922658986</sourcerecordid><originalsourceid>FETCH-proquest_journals_29226589863</originalsourceid><addsrcrecordid>eNqNjM0KgkAURocgSMp3GGgt2J00bdkfLYIg3YZMdidHbKZmRqK3T6kHaPUtzjnfgHjA2CxI5gAj4ltbh2EI8QKiiHnkvJFCZE66slrSldbWSXWjRyEaqZCeUCqhTYl3VI4ekBvV45d0Fe3D1kqtggu3eKW54TWWTps3_f515oQMBW8s-r8dk-lum6_3wcPoZ4vWFbVujepQASlAHCVpErP_rA9BTkRi</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2922658986</pqid></control><display><type>article</type><title>DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching</title><source>Publicly Available Content Database</source><creator>Li, Guanghe ; Shan, Yixiang ; Zhu, Zhengbang ; Long, Ting ; Zhang, Weinan</creator><creatorcontrib>Li, Guanghe ; Shan, Yixiang ; Zhu, Zhengbang ; Long, Ting ; Zhang, Weinan</creatorcontrib><description>In offline reinforcement learning (RL), the performance of the learned policy highly depends on the quality of offline datasets. However, in many cases, the offline dataset contains very limited optimal trajectories, which poses a challenge for offline RL algorithms as agents must acquire the ability to transit to high-reward regions. To address this issue, we introduce Diffusion-based Trajectory Stitching (DiffStitch), a novel diffusion-based data augmentation pipeline that systematically generates stitching transitions between trajectories. DiffStitch effectively connects low-reward trajectories with high-reward trajectories, forming globally optimal trajectories to address the challenges faced by offline RL algorithms. Empirical experiments conducted on D4RL datasets demonstrate the effectiveness of DiffStitch across RL methodologies. Notably, DiffStitch demonstrates substantial enhancements in the performance of one-step methods (IQL), imitation learning methods (TD3+BC), and trajectory optimization methods (DT).</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Data augmentation ; Datasets ; Machine learning ; Stitching ; Trajectory optimization</subject><ispartof>arXiv.org, 2024-02</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2922658986?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>776,780,25731,36989,44566</link.rule.ids></links><search><creatorcontrib>Li, Guanghe</creatorcontrib><creatorcontrib>Shan, Yixiang</creatorcontrib><creatorcontrib>Zhu, Zhengbang</creatorcontrib><creatorcontrib>Long, Ting</creatorcontrib><creatorcontrib>Zhang, Weinan</creatorcontrib><title>DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching</title><title>arXiv.org</title><description>In offline reinforcement learning (RL), the performance of the learned policy highly depends on the quality of offline datasets. However, in many cases, the offline dataset contains very limited optimal trajectories, which poses a challenge for offline RL algorithms as agents must acquire the ability to transit to high-reward regions. To address this issue, we introduce Diffusion-based Trajectory Stitching (DiffStitch), a novel diffusion-based data augmentation pipeline that systematically generates stitching transitions between trajectories. DiffStitch effectively connects low-reward trajectories with high-reward trajectories, forming globally optimal trajectories to address the challenges faced by offline RL algorithms. Empirical experiments conducted on D4RL datasets demonstrate the effectiveness of DiffStitch across RL methodologies. Notably, DiffStitch demonstrates substantial enhancements in the performance of one-step methods (IQL), imitation learning methods (TD3+BC), and trajectory optimization methods (DT).</description><subject>Algorithms</subject><subject>Data augmentation</subject><subject>Datasets</subject><subject>Machine learning</subject><subject>Stitching</subject><subject>Trajectory optimization</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNjM0KgkAURocgSMp3GGgt2J00bdkfLYIg3YZMdidHbKZmRqK3T6kHaPUtzjnfgHjA2CxI5gAj4ltbh2EI8QKiiHnkvJFCZE66slrSldbWSXWjRyEaqZCeUCqhTYl3VI4ekBvV45d0Fe3D1kqtggu3eKW54TWWTps3_f515oQMBW8s-r8dk-lum6_3wcPoZ4vWFbVujepQASlAHCVpErP_rA9BTkRi</recordid><startdate>20240222</startdate><enddate>20240222</enddate><creator>Li, Guanghe</creator><creator>Shan, Yixiang</creator><creator>Zhu, Zhengbang</creator><creator>Long, Ting</creator><creator>Zhang, Weinan</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240222</creationdate><title>DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching</title><author>Li, Guanghe ; Shan, Yixiang ; Zhu, Zhengbang ; Long, Ting ; Zhang, Weinan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29226589863</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Data augmentation</topic><topic>Datasets</topic><topic>Machine learning</topic><topic>Stitching</topic><topic>Trajectory optimization</topic><toplevel>online_resources</toplevel><creatorcontrib>Li, Guanghe</creatorcontrib><creatorcontrib>Shan, Yixiang</creatorcontrib><creatorcontrib>Zhu, Zhengbang</creatorcontrib><creatorcontrib>Long, Ting</creatorcontrib><creatorcontrib>Zhang, Weinan</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Guanghe</au><au>Shan, Yixiang</au><au>Zhu, Zhengbang</au><au>Long, Ting</au><au>Zhang, Weinan</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching</atitle><jtitle>arXiv.org</jtitle><date>2024-02-22</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>In offline reinforcement learning (RL), the performance of the learned policy highly depends on the quality of offline datasets. However, in many cases, the offline dataset contains very limited optimal trajectories, which poses a challenge for offline RL algorithms as agents must acquire the ability to transit to high-reward regions. To address this issue, we introduce Diffusion-based Trajectory Stitching (DiffStitch), a novel diffusion-based data augmentation pipeline that systematically generates stitching transitions between trajectories. DiffStitch effectively connects low-reward trajectories with high-reward trajectories, forming globally optimal trajectories to address the challenges faced by offline RL algorithms. Empirical experiments conducted on D4RL datasets demonstrate the effectiveness of DiffStitch across RL methodologies. Notably, DiffStitch demonstrates substantial enhancements in the performance of one-step methods (IQL), imitation learning methods (TD3+BC), and trajectory optimization methods (DT).</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-02 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2922658986 |
source | Publicly Available Content Database |
subjects | Algorithms Data augmentation Datasets Machine learning Stitching Trajectory optimization |
title | DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T20%3A29%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=DiffStitch:%20Boosting%20Offline%20Reinforcement%20Learning%20with%20Diffusion-based%20Trajectory%20Stitching&rft.jtitle=arXiv.org&rft.au=Li,%20Guanghe&rft.date=2024-02-22&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2922658986%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_29226589863%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2922658986&rft_id=info:pmid/&rfr_iscdi=true |