Loading…

DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching

In offline reinforcement learning (RL), the performance of the learned policy highly depends on the quality of offline datasets. However, in many cases, the offline dataset contains very limited optimal trajectories, which poses a challenge for offline RL algorithms as agents must acquire the abilit...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2024-02
Main Authors: Li, Guanghe, Shan, Yixiang, Zhu, Zhengbang, Long, Ting, Zhang, Weinan
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Li, Guanghe
Shan, Yixiang
Zhu, Zhengbang
Long, Ting
Zhang, Weinan
description In offline reinforcement learning (RL), the performance of the learned policy highly depends on the quality of offline datasets. However, in many cases, the offline dataset contains very limited optimal trajectories, which poses a challenge for offline RL algorithms as agents must acquire the ability to transit to high-reward regions. To address this issue, we introduce Diffusion-based Trajectory Stitching (DiffStitch), a novel diffusion-based data augmentation pipeline that systematically generates stitching transitions between trajectories. DiffStitch effectively connects low-reward trajectories with high-reward trajectories, forming globally optimal trajectories to address the challenges faced by offline RL algorithms. Empirical experiments conducted on D4RL datasets demonstrate the effectiveness of DiffStitch across RL methodologies. Notably, DiffStitch demonstrates substantial enhancements in the performance of one-step methods (IQL), imitation learning methods (TD3+BC), and trajectory optimization methods (DT).
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2922658986</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2922658986</sourcerecordid><originalsourceid>FETCH-proquest_journals_29226589863</originalsourceid><addsrcrecordid>eNqNjM0KgkAURocgSMp3GGgt2J00bdkfLYIg3YZMdidHbKZmRqK3T6kHaPUtzjnfgHjA2CxI5gAj4ltbh2EI8QKiiHnkvJFCZE66slrSldbWSXWjRyEaqZCeUCqhTYl3VI4ekBvV45d0Fe3D1kqtggu3eKW54TWWTps3_f515oQMBW8s-r8dk-lum6_3wcPoZ4vWFbVujepQASlAHCVpErP_rA9BTkRi</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2922658986</pqid></control><display><type>article</type><title>DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching</title><source>Publicly Available Content Database</source><creator>Li, Guanghe ; Shan, Yixiang ; Zhu, Zhengbang ; Long, Ting ; Zhang, Weinan</creator><creatorcontrib>Li, Guanghe ; Shan, Yixiang ; Zhu, Zhengbang ; Long, Ting ; Zhang, Weinan</creatorcontrib><description>In offline reinforcement learning (RL), the performance of the learned policy highly depends on the quality of offline datasets. However, in many cases, the offline dataset contains very limited optimal trajectories, which poses a challenge for offline RL algorithms as agents must acquire the ability to transit to high-reward regions. To address this issue, we introduce Diffusion-based Trajectory Stitching (DiffStitch), a novel diffusion-based data augmentation pipeline that systematically generates stitching transitions between trajectories. DiffStitch effectively connects low-reward trajectories with high-reward trajectories, forming globally optimal trajectories to address the challenges faced by offline RL algorithms. Empirical experiments conducted on D4RL datasets demonstrate the effectiveness of DiffStitch across RL methodologies. Notably, DiffStitch demonstrates substantial enhancements in the performance of one-step methods (IQL), imitation learning methods (TD3+BC), and trajectory optimization methods (DT).</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Data augmentation ; Datasets ; Machine learning ; Stitching ; Trajectory optimization</subject><ispartof>arXiv.org, 2024-02</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2922658986?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>776,780,25731,36989,44566</link.rule.ids></links><search><creatorcontrib>Li, Guanghe</creatorcontrib><creatorcontrib>Shan, Yixiang</creatorcontrib><creatorcontrib>Zhu, Zhengbang</creatorcontrib><creatorcontrib>Long, Ting</creatorcontrib><creatorcontrib>Zhang, Weinan</creatorcontrib><title>DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching</title><title>arXiv.org</title><description>In offline reinforcement learning (RL), the performance of the learned policy highly depends on the quality of offline datasets. However, in many cases, the offline dataset contains very limited optimal trajectories, which poses a challenge for offline RL algorithms as agents must acquire the ability to transit to high-reward regions. To address this issue, we introduce Diffusion-based Trajectory Stitching (DiffStitch), a novel diffusion-based data augmentation pipeline that systematically generates stitching transitions between trajectories. DiffStitch effectively connects low-reward trajectories with high-reward trajectories, forming globally optimal trajectories to address the challenges faced by offline RL algorithms. Empirical experiments conducted on D4RL datasets demonstrate the effectiveness of DiffStitch across RL methodologies. Notably, DiffStitch demonstrates substantial enhancements in the performance of one-step methods (IQL), imitation learning methods (TD3+BC), and trajectory optimization methods (DT).</description><subject>Algorithms</subject><subject>Data augmentation</subject><subject>Datasets</subject><subject>Machine learning</subject><subject>Stitching</subject><subject>Trajectory optimization</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNjM0KgkAURocgSMp3GGgt2J00bdkfLYIg3YZMdidHbKZmRqK3T6kHaPUtzjnfgHjA2CxI5gAj4ltbh2EI8QKiiHnkvJFCZE66slrSldbWSXWjRyEaqZCeUCqhTYl3VI4ekBvV45d0Fe3D1kqtggu3eKW54TWWTps3_f515oQMBW8s-r8dk-lum6_3wcPoZ4vWFbVujepQASlAHCVpErP_rA9BTkRi</recordid><startdate>20240222</startdate><enddate>20240222</enddate><creator>Li, Guanghe</creator><creator>Shan, Yixiang</creator><creator>Zhu, Zhengbang</creator><creator>Long, Ting</creator><creator>Zhang, Weinan</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240222</creationdate><title>DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching</title><author>Li, Guanghe ; Shan, Yixiang ; Zhu, Zhengbang ; Long, Ting ; Zhang, Weinan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29226589863</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Data augmentation</topic><topic>Datasets</topic><topic>Machine learning</topic><topic>Stitching</topic><topic>Trajectory optimization</topic><toplevel>online_resources</toplevel><creatorcontrib>Li, Guanghe</creatorcontrib><creatorcontrib>Shan, Yixiang</creatorcontrib><creatorcontrib>Zhu, Zhengbang</creatorcontrib><creatorcontrib>Long, Ting</creatorcontrib><creatorcontrib>Zhang, Weinan</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Guanghe</au><au>Shan, Yixiang</au><au>Zhu, Zhengbang</au><au>Long, Ting</au><au>Zhang, Weinan</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching</atitle><jtitle>arXiv.org</jtitle><date>2024-02-22</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>In offline reinforcement learning (RL), the performance of the learned policy highly depends on the quality of offline datasets. However, in many cases, the offline dataset contains very limited optimal trajectories, which poses a challenge for offline RL algorithms as agents must acquire the ability to transit to high-reward regions. To address this issue, we introduce Diffusion-based Trajectory Stitching (DiffStitch), a novel diffusion-based data augmentation pipeline that systematically generates stitching transitions between trajectories. DiffStitch effectively connects low-reward trajectories with high-reward trajectories, forming globally optimal trajectories to address the challenges faced by offline RL algorithms. Empirical experiments conducted on D4RL datasets demonstrate the effectiveness of DiffStitch across RL methodologies. Notably, DiffStitch demonstrates substantial enhancements in the performance of one-step methods (IQL), imitation learning methods (TD3+BC), and trajectory optimization methods (DT).</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-02
issn 2331-8422
language eng
recordid cdi_proquest_journals_2922658986
source Publicly Available Content Database
subjects Algorithms
Data augmentation
Datasets
Machine learning
Stitching
Trajectory optimization
title DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T20%3A29%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=DiffStitch:%20Boosting%20Offline%20Reinforcement%20Learning%20with%20Diffusion-based%20Trajectory%20Stitching&rft.jtitle=arXiv.org&rft.au=Li,%20Guanghe&rft.date=2024-02-22&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2922658986%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_29226589863%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2922658986&rft_id=info:pmid/&rfr_iscdi=true