Loading…

DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching

In offline reinforcement learning (RL), the performance of the learned policy highly depends on the quality of offline datasets. However, in many cases, the offline dataset contains very limited optimal trajectories, which poses a challenge for offline RL algorithms as agents must acquire the abilit...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2024-02
Main Authors:	Li, Guanghe, Shan, Yixiang, Zhu, Zhengbang, Long, Ting, Zhang, Weinan
Format:	Article
Language:	English
Subjects:	Algorithms Data augmentation Datasets Machine learning Stitching Trajectory optimization
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Li, Guanghe Shan, Yixiang Zhu, Zhengbang Long, Ting Zhang, Weinan
description	In offline reinforcement learning (RL), the performance of the learned policy highly depends on the quality of offline datasets. However, in many cases, the offline dataset contains very limited optimal trajectories, which poses a challenge for offline RL algorithms as agents must acquire the ability to transit to high-reward regions. To address this issue, we introduce Diffusion-based Trajectory Stitching (DiffStitch), a novel diffusion-based data augmentation pipeline that systematically generates stitching transitions between trajectories. DiffStitch effectively connects low-reward trajectories with high-reward trajectories, forming globally optimal trajectories to address the challenges faced by offline RL algorithms. Empirical experiments conducted on D4RL datasets demonstrate the effectiveness of DiffStitch across RL methodologies. Notably, DiffStitch demonstrates substantial enhancements in the performance of one-step methods (IQL), imitation learning methods (TD3+BC), and trajectory optimization methods (DT).
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2922658986</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2922658986</sourcerecordid><originalsourceid>FETCH-proquest_journals_29226589863</originalsourceid><addsrcrecordid>eNqNjM0KgkAURocgSMp3GGgt2J00bdkfLYIg3YZMdidHbKZmRqK3T6kHaPUtzjnfgHjA2CxI5gAj4ltbh2EI8QKiiHnkvJFCZE66slrSldbWSXWjRyEaqZCeUCqhTYl3VI4ekBvV45d0Fe3D1kqtggu3eKW54TWWTps3_f515oQMBW8s-r8dk-lum6_3wcPoZ4vWFbVujepQASlAHCVpErP_rA9BTkRi</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2922658986</pqid></control><display><type>article</type><title>DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching</title><source>Publicly Available Content Database</source><creator>Li, Guanghe ; Shan, Yixiang ; Zhu, Zhengbang ; Long, Ting ; Zhang, Weinan</creator><creatorcontrib>Li, Guanghe ; Shan, Yixiang ; Zhu, Zhengbang ; Long, Ting ; Zhang, Weinan</creatorcontrib><description>In offline reinforcement learning (RL), the performance of the learned policy highly depends on the quality of offline datasets. However, in many cases, the offline dataset contains very limited optimal trajectories, which poses a challenge for offline RL algorithms as agents must acquire the ability to transit to high-reward regions. To address this issue, we introduce Diffusion-based Trajectory Stitching (DiffStitch), a novel diffusion-based data augmentation pipeline that systematically generates stitching transitions between trajectories. DiffStitch effectively connects low-reward trajectories with high-reward trajectories, forming globally optimal trajectories to address the challenges faced by offline RL algorithms. Empirical experiments conducted on D4RL datasets demonstrate the effectiveness of DiffStitch across RL methodologies. Notably, DiffStitch demonstrates substantial enhancements in the performance of one-step methods (IQL), imitation learning methods (TD3+BC), and trajectory optimization methods (DT).</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Data augmentation ; Datasets ; Machine learning ; Stitching ; Trajectory optimization</subject><ispartof>arXiv.org, 2024-02</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2922658986?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>776,780,25731,36989,44566</link.rule.ids></links><search><creatorcontrib>Li, Guanghe</creatorcontrib><creatorcontrib>Shan, Yixiang</creatorcontrib><creatorcontrib>Zhu, Zhengbang</creatorcontrib><creatorcontrib>Long, Ting</creatorcontrib><creatorcontrib>Zhang, Weinan</creatorcontrib><title>DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching</title><title>arXiv.org</title><description>In offline reinforcement learning (RL), the performance of the learned policy highly depends on the quality of offline datasets. However, in many cases, the offline dataset contains very limited optimal trajectories, which poses a challenge for offline RL algorithms as agents must acquire the ability to transit to high-reward regions. To address this issue, we introduce Diffusion-based Trajectory Stitching (DiffStitch), a novel diffusion-based data augmentation pipeline that systematically generates stitching transitions between trajectories. DiffStitch effectively connects low-reward trajectories with high-reward trajectories, forming globally optimal trajectories to address the challenges faced by offline RL algorithms. Empirical experiments conducted on D4RL datasets demonstrate the effectiveness of DiffStitch across RL methodologies. Notably, DiffStitch demonstrates substantial enhancements in the performance of one-step methods (IQL), imitation learning methods (TD3+BC), and trajectory optimization methods (DT).</description><subject>Algorithms</subject><subject>Data augmentation</subject><subject>Datasets</subject><subject>Machine learning</subject><subject>Stitching</subject><subject>Trajectory optimization</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNjM0KgkAURocgSMp3GGgt2J00bdkfLYIg3YZMdidHbKZmRqK3T6kHaPUtzjnfgHjA2CxI5gAj4ltbh2EI8QKiiHnkvJFCZE66slrSldbWSXWjRyEaqZCeUCqhTYl3VI4ekBvV45d0Fe3D1kqtggu3eKW54TWWTps3_f515oQMBW8s-r8dk-lum6_3wcPoZ4vWFbVujepQASlAHCVpErP_rA9BTkRi</recordid><startdate>20240222</startdate><enddate>20240222</enddate><creator>Li, Guanghe</creator><creator>Shan, Yixiang</creator><creator>Zhu, Zhengbang</creator><creator>Long, Ting</creator><creator>Zhang, Weinan</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240222</creationdate><title>DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching</title><author>Li, Guanghe ; Shan, Yixiang ; Zhu, Zhengbang ; Long, Ting ; Zhang, Weinan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29226589863</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Data augmentation</topic><topic>Datasets</topic><topic>Machine learning</topic><topic>Stitching</topic><topic>Trajectory optimization</topic><toplevel>online_resources</toplevel><creatorcontrib>Li, Guanghe</creatorcontrib><creatorcontrib>Shan, Yixiang</creatorcontrib><creatorcontrib>Zhu, Zhengbang</creatorcontrib><creatorcontrib>Long, Ting</creatorcontrib><creatorcontrib>Zhang, Weinan</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Guanghe</au><au>Shan, Yixiang</au><au>Zhu, Zhengbang</au><au>Long, Ting</au><au>Zhang, Weinan</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching</atitle><jtitle>arXiv.org</jtitle><date>2024-02-22</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>In offline reinforcement learning (RL), the performance of the learned policy highly depends on the quality of offline datasets. However, in many cases, the offline dataset contains very limited optimal trajectories, which poses a challenge for offline RL algorithms as agents must acquire the ability to transit to high-reward regions. To address this issue, we introduce Diffusion-based Trajectory Stitching (DiffStitch), a novel diffusion-based data augmentation pipeline that systematically generates stitching transitions between trajectories. DiffStitch effectively connects low-reward trajectories with high-reward trajectories, forming globally optimal trajectories to address the challenges faced by offline RL algorithms. Empirical experiments conducted on D4RL datasets demonstrate the effectiveness of DiffStitch across RL methodologies. Notably, DiffStitch demonstrates substantial enhancements in the performance of one-step methods (IQL), imitation learning methods (TD3+BC), and trajectory optimization methods (DT).</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-02
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2922658986
source	Publicly Available Content Database
subjects	Algorithms Data augmentation Datasets Machine learning Stitching Trajectory optimization
title	DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T20%3A29%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=DiffStitch:%20Boosting%20Offline%20Reinforcement%20Learning%20with%20Diffusion-based%20Trajectory%20Stitching&rft.jtitle=arXiv.org&rft.au=Li,%20Guanghe&rft.date=2024-02-22&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2922658986%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_29226589863%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2922658986&rft_id=info:pmid/&rfr_iscdi=true