Loading…
Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks
Offline reinforcement learning (RL) leverages previously collected data for policy optimization without any further active exploration. Despite the recent interest in this problem, its theoretical results in neural network function approximation settings remain elusive. In this paper, we study the s...
Saved in:
Published in: | arXiv.org 2022-12 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Nguyen-Tang, Thanh Gupta, Sunil Hung Tran-The Venkatesh, Svetha |
description | Offline reinforcement learning (RL) leverages previously collected data for policy optimization without any further active exploration. Despite the recent interest in this problem, its theoretical results in neural network function approximation settings remain elusive. In this paper, we study the statistical theory of offline RL with deep ReLU network function approximation. In particular, we establish the sample complexity of \(n = \tilde{\mathcal{O}}( H^{4 + 4 \frac{d}{\alpha}} \kappa_{\mu}^{1 + \frac{d}{\alpha}} \epsilon^{-2 - 2\frac{d}{\alpha}} )\) for offline RL with deep ReLU networks, where \(\kappa_{\mu}\) is a measure of distributional shift, {\(H = (1-\gamma)^{-1}\) is the effective horizon length}, \(d\) is the dimension of the state-action space, \(\alpha\) is a (possibly fractional) smoothness parameter of the underlying Markov decision process (MDP), and \(\epsilon\) is a user-specified error. Notably, our sample complexity holds under two novel considerations: the Besov dynamic closure and the correlated structure. While the Besov dynamic closure subsumes the dynamic conditions for offline RL in the prior works, the correlated structure renders the prior works of offline RL with general/neural network function approximation improper or inefficient {in long (effective) horizon problems}. To the best of our knowledge, this is the first theoretical characterization of the sample complexity of offline RL with deep neural network function approximation under the general Besov regularity condition that goes beyond {the linearity regime} in the traditional Reproducing Hilbert kernel spaces and Neural Tangent Kernels. |
format | article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2500710711</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2500710711</sourcerecordid><originalsourceid>FETCH-proquest_journals_25007107113</originalsourceid><addsrcrecordid>eNqNitEKgjAUQEcQJOU_XOhZmFtm71YESUHZs0jc1Uy3tU2sv8-gDwgOnIdzRiRgnMfRasHYhITO1ZRStkxZkvCA7M9VaxqETH_1kv4NWsBRiEYqhBNKJbS9YovKQ46VVVLdoJf-DmtEMwz5BQ7oe20fbkbGomochj9PyXy7KbJdZKx-duh8WevOqiGVLKE0jQdi_t_1Ae3NPLg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2500710711</pqid></control><display><type>article</type><title>Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks</title><source>Publicly Available Content Database</source><creator>Nguyen-Tang, Thanh ; Gupta, Sunil ; Hung Tran-The ; Venkatesh, Svetha</creator><creatorcontrib>Nguyen-Tang, Thanh ; Gupta, Sunil ; Hung Tran-The ; Venkatesh, Svetha</creatorcontrib><description>Offline reinforcement learning (RL) leverages previously collected data for policy optimization without any further active exploration. Despite the recent interest in this problem, its theoretical results in neural network function approximation settings remain elusive. In this paper, we study the statistical theory of offline RL with deep ReLU network function approximation. In particular, we establish the sample complexity of \(n = \tilde{\mathcal{O}}( H^{4 + 4 \frac{d}{\alpha}} \kappa_{\mu}^{1 + \frac{d}{\alpha}} \epsilon^{-2 - 2\frac{d}{\alpha}} )\) for offline RL with deep ReLU networks, where \(\kappa_{\mu}\) is a measure of distributional shift, {\(H = (1-\gamma)^{-1}\) is the effective horizon length}, \(d\) is the dimension of the state-action space, \(\alpha\) is a (possibly fractional) smoothness parameter of the underlying Markov decision process (MDP), and \(\epsilon\) is a user-specified error. Notably, our sample complexity holds under two novel considerations: the Besov dynamic closure and the correlated structure. While the Besov dynamic closure subsumes the dynamic conditions for offline RL in the prior works, the correlated structure renders the prior works of offline RL with general/neural network function approximation improper or inefficient {in long (effective) horizon problems}. To the best of our knowledge, this is the first theoretical characterization of the sample complexity of offline RL with deep neural network function approximation under the general Besov regularity condition that goes beyond {the linearity regime} in the traditional Reproducing Hilbert kernel spaces and Neural Tangent Kernels.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Approximation ; Machine learning ; Networks ; Regression analysis ; Regularity ; Statistical analysis</subject><ispartof>arXiv.org, 2022-12</ispartof><rights>2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2500710711?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>776,780,25731,36989,44566</link.rule.ids></links><search><creatorcontrib>Nguyen-Tang, Thanh</creatorcontrib><creatorcontrib>Gupta, Sunil</creatorcontrib><creatorcontrib>Hung Tran-The</creatorcontrib><creatorcontrib>Venkatesh, Svetha</creatorcontrib><title>Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks</title><title>arXiv.org</title><description>Offline reinforcement learning (RL) leverages previously collected data for policy optimization without any further active exploration. Despite the recent interest in this problem, its theoretical results in neural network function approximation settings remain elusive. In this paper, we study the statistical theory of offline RL with deep ReLU network function approximation. In particular, we establish the sample complexity of \(n = \tilde{\mathcal{O}}( H^{4 + 4 \frac{d}{\alpha}} \kappa_{\mu}^{1 + \frac{d}{\alpha}} \epsilon^{-2 - 2\frac{d}{\alpha}} )\) for offline RL with deep ReLU networks, where \(\kappa_{\mu}\) is a measure of distributional shift, {\(H = (1-\gamma)^{-1}\) is the effective horizon length}, \(d\) is the dimension of the state-action space, \(\alpha\) is a (possibly fractional) smoothness parameter of the underlying Markov decision process (MDP), and \(\epsilon\) is a user-specified error. Notably, our sample complexity holds under two novel considerations: the Besov dynamic closure and the correlated structure. While the Besov dynamic closure subsumes the dynamic conditions for offline RL in the prior works, the correlated structure renders the prior works of offline RL with general/neural network function approximation improper or inefficient {in long (effective) horizon problems}. To the best of our knowledge, this is the first theoretical characterization of the sample complexity of offline RL with deep neural network function approximation under the general Besov regularity condition that goes beyond {the linearity regime} in the traditional Reproducing Hilbert kernel spaces and Neural Tangent Kernels.</description><subject>Algorithms</subject><subject>Approximation</subject><subject>Machine learning</subject><subject>Networks</subject><subject>Regression analysis</subject><subject>Regularity</subject><subject>Statistical analysis</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNitEKgjAUQEcQJOU_XOhZmFtm71YESUHZs0jc1Uy3tU2sv8-gDwgOnIdzRiRgnMfRasHYhITO1ZRStkxZkvCA7M9VaxqETH_1kv4NWsBRiEYqhBNKJbS9YovKQ46VVVLdoJf-DmtEMwz5BQ7oe20fbkbGomochj9PyXy7KbJdZKx-duh8WevOqiGVLKE0jQdi_t_1Ae3NPLg</recordid><startdate>20221213</startdate><enddate>20221213</enddate><creator>Nguyen-Tang, Thanh</creator><creator>Gupta, Sunil</creator><creator>Hung Tran-The</creator><creator>Venkatesh, Svetha</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20221213</creationdate><title>Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks</title><author>Nguyen-Tang, Thanh ; Gupta, Sunil ; Hung Tran-The ; Venkatesh, Svetha</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_25007107113</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Approximation</topic><topic>Machine learning</topic><topic>Networks</topic><topic>Regression analysis</topic><topic>Regularity</topic><topic>Statistical analysis</topic><toplevel>online_resources</toplevel><creatorcontrib>Nguyen-Tang, Thanh</creatorcontrib><creatorcontrib>Gupta, Sunil</creatorcontrib><creatorcontrib>Hung Tran-The</creatorcontrib><creatorcontrib>Venkatesh, Svetha</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nguyen-Tang, Thanh</au><au>Gupta, Sunil</au><au>Hung Tran-The</au><au>Venkatesh, Svetha</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks</atitle><jtitle>arXiv.org</jtitle><date>2022-12-13</date><risdate>2022</risdate><eissn>2331-8422</eissn><abstract>Offline reinforcement learning (RL) leverages previously collected data for policy optimization without any further active exploration. Despite the recent interest in this problem, its theoretical results in neural network function approximation settings remain elusive. In this paper, we study the statistical theory of offline RL with deep ReLU network function approximation. In particular, we establish the sample complexity of \(n = \tilde{\mathcal{O}}( H^{4 + 4 \frac{d}{\alpha}} \kappa_{\mu}^{1 + \frac{d}{\alpha}} \epsilon^{-2 - 2\frac{d}{\alpha}} )\) for offline RL with deep ReLU networks, where \(\kappa_{\mu}\) is a measure of distributional shift, {\(H = (1-\gamma)^{-1}\) is the effective horizon length}, \(d\) is the dimension of the state-action space, \(\alpha\) is a (possibly fractional) smoothness parameter of the underlying Markov decision process (MDP), and \(\epsilon\) is a user-specified error. Notably, our sample complexity holds under two novel considerations: the Besov dynamic closure and the correlated structure. While the Besov dynamic closure subsumes the dynamic conditions for offline RL in the prior works, the correlated structure renders the prior works of offline RL with general/neural network function approximation improper or inefficient {in long (effective) horizon problems}. To the best of our knowledge, this is the first theoretical characterization of the sample complexity of offline RL with deep neural network function approximation under the general Besov regularity condition that goes beyond {the linearity regime} in the traditional Reproducing Hilbert kernel spaces and Neural Tangent Kernels.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2022-12 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2500710711 |
source | Publicly Available Content Database |
subjects | Algorithms Approximation Machine learning Networks Regression analysis Regularity Statistical analysis |
title | Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T15%3A49%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Sample%20Complexity%20of%20Offline%20Reinforcement%20Learning%20with%20Deep%20ReLU%20Networks&rft.jtitle=arXiv.org&rft.au=Nguyen-Tang,%20Thanh&rft.date=2022-12-13&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2500710711%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_25007107113%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2500710711&rft_id=info:pmid/&rfr_iscdi=true |