Loading…
On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation
Sample-efficient offline reinforcement learning (RL) with linear function approximation has been studied extensively recently. Much of the prior work has yielded instance-independent rates that hold even for the worst-case realization of problem instances. This work seeks to understand instance-depe...
Saved in:
Published in: | Proceedings of the ... AAAI Conference on Artificial Intelligence 2023-06, Vol.37 (8), p.9310-9318 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 9318 |
container_issue | 8 |
container_start_page | 9310 |
container_title | Proceedings of the ... AAAI Conference on Artificial Intelligence |
container_volume | 37 |
creator | Nguyen-Tang, Thanh Yin, Ming Gupta, Sunil Venkatesh, Svetha Arora, Raman |
description | Sample-efficient offline reinforcement learning (RL) with linear function approximation has been studied extensively recently. Much of the prior work has yielded instance-independent rates that hold even for the worst-case realization of problem instances. This work seeks to understand instance-dependent bounds for offline RL with linear function approximation. We present an algorithm called Bootstrapped and Constrained Pessimistic Value Iteration (BCP-VI), which leverages data bootstrapping and constrained optimization on top of pessimism. We show that under a partial data coverage assumption, that of concentrability with respect to an optimal policy, the proposed algorithm yields a fast rate for offline RL when there is a positive gap in the optimal Q-value functions, even if the offline data were collected adaptively. Moreover, when the linear features of the optimal actions in the states reachable by an optimal policy span those reachable by the behavior policy and the optimal actions are unique, offline RL achieves absolute zero sub-optimality error when the number of episodes exceeds a (finite) instance-dependent threshold. To the best of our knowledge, these are the first results that give a fast rate bound on the sub-optimality and an absolute zero sub-optimality bound for offline RL with linear function approximation from adaptive data with partial coverage. We also provide instance-agnostic and instance-dependent information-theoretical lower bounds to complement our upper bounds. |
doi_str_mv | 10.1609/aaai.v37i8.26116 |
format | article |
fullrecord | <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1609_aaai_v37i8_26116</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1609_aaai_v37i8_26116</sourcerecordid><originalsourceid>FETCH-LOGICAL-c168t-c344eb4e44068b8a4d5240c72b65ccae32f9616b49620f0015037061e3e6a5c63</originalsourceid><addsrcrecordid>eNotkF9LwzAUxYMoOHTvPuYLdOY26W3zOKfTQWEg-lzS7EYjW1qSzj_f3nZ6X849cDgcfozdgFgACn1rjPGLT1n6apEjAJ6xWS5LlUmF1fn4Q6GzQmp9yeYpfYjxlAaAcsbabeCbkAYTLGX31FPYURj4XXcMu8RdF_nWub0PxJ_Jh9FbOkyBmkwMPrzxLz-883oMmMjXx2AH3wW-7PvYffuDmdw1u3Bmn2j-r1fsdf3wsnrK6u3jZrWsMwtYDZmVSlGrSCmBVVsZtStyJWyZt1hYa0jmTiNgqzTmwgkBhZClQCBJaAqL8oqJv14bu5QiuaaP44T404BoJkzNhKk5YWpOmOQvDKpdEQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation</title><source>Freely Accessible Science Journals - check A-Z of ejournals</source><creator>Nguyen-Tang, Thanh ; Yin, Ming ; Gupta, Sunil ; Venkatesh, Svetha ; Arora, Raman</creator><creatorcontrib>Nguyen-Tang, Thanh ; Yin, Ming ; Gupta, Sunil ; Venkatesh, Svetha ; Arora, Raman</creatorcontrib><description>Sample-efficient offline reinforcement learning (RL) with linear function approximation has been studied extensively recently. Much of the prior work has yielded instance-independent rates that hold even for the worst-case realization of problem instances. This work seeks to understand instance-dependent bounds for offline RL with linear function approximation. We present an algorithm called Bootstrapped and Constrained Pessimistic Value Iteration (BCP-VI), which leverages data bootstrapping and constrained optimization on top of pessimism. We show that under a partial data coverage assumption, that of concentrability with respect to an optimal policy, the proposed algorithm yields a fast rate for offline RL when there is a positive gap in the optimal Q-value functions, even if the offline data were collected adaptively. Moreover, when the linear features of the optimal actions in the states reachable by an optimal policy span those reachable by the behavior policy and the optimal actions are unique, offline RL achieves absolute zero sub-optimality error when the number of episodes exceeds a (finite) instance-dependent threshold. To the best of our knowledge, these are the first results that give a fast rate bound on the sub-optimality and an absolute zero sub-optimality bound for offline RL with linear function approximation from adaptive data with partial coverage. We also provide instance-agnostic and instance-dependent information-theoretical lower bounds to complement our upper bounds.</description><identifier>ISSN: 2159-5399</identifier><identifier>EISSN: 2374-3468</identifier><identifier>DOI: 10.1609/aaai.v37i8.26116</identifier><language>eng</language><ispartof>Proceedings of the ... AAAI Conference on Artificial Intelligence, 2023-06, Vol.37 (8), p.9310-9318</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Nguyen-Tang, Thanh</creatorcontrib><creatorcontrib>Yin, Ming</creatorcontrib><creatorcontrib>Gupta, Sunil</creatorcontrib><creatorcontrib>Venkatesh, Svetha</creatorcontrib><creatorcontrib>Arora, Raman</creatorcontrib><title>On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation</title><title>Proceedings of the ... AAAI Conference on Artificial Intelligence</title><description>Sample-efficient offline reinforcement learning (RL) with linear function approximation has been studied extensively recently. Much of the prior work has yielded instance-independent rates that hold even for the worst-case realization of problem instances. This work seeks to understand instance-dependent bounds for offline RL with linear function approximation. We present an algorithm called Bootstrapped and Constrained Pessimistic Value Iteration (BCP-VI), which leverages data bootstrapping and constrained optimization on top of pessimism. We show that under a partial data coverage assumption, that of concentrability with respect to an optimal policy, the proposed algorithm yields a fast rate for offline RL when there is a positive gap in the optimal Q-value functions, even if the offline data were collected adaptively. Moreover, when the linear features of the optimal actions in the states reachable by an optimal policy span those reachable by the behavior policy and the optimal actions are unique, offline RL achieves absolute zero sub-optimality error when the number of episodes exceeds a (finite) instance-dependent threshold. To the best of our knowledge, these are the first results that give a fast rate bound on the sub-optimality and an absolute zero sub-optimality bound for offline RL with linear function approximation from adaptive data with partial coverage. We also provide instance-agnostic and instance-dependent information-theoretical lower bounds to complement our upper bounds.</description><issn>2159-5399</issn><issn>2374-3468</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNotkF9LwzAUxYMoOHTvPuYLdOY26W3zOKfTQWEg-lzS7EYjW1qSzj_f3nZ6X849cDgcfozdgFgACn1rjPGLT1n6apEjAJ6xWS5LlUmF1fn4Q6GzQmp9yeYpfYjxlAaAcsbabeCbkAYTLGX31FPYURj4XXcMu8RdF_nWub0PxJ_Jh9FbOkyBmkwMPrzxLz-883oMmMjXx2AH3wW-7PvYffuDmdw1u3Bmn2j-r1fsdf3wsnrK6u3jZrWsMwtYDZmVSlGrSCmBVVsZtStyJWyZt1hYa0jmTiNgqzTmwgkBhZClQCBJaAqL8oqJv14bu5QiuaaP44T404BoJkzNhKk5YWpOmOQvDKpdEQ</recordid><startdate>20230626</startdate><enddate>20230626</enddate><creator>Nguyen-Tang, Thanh</creator><creator>Yin, Ming</creator><creator>Gupta, Sunil</creator><creator>Venkatesh, Svetha</creator><creator>Arora, Raman</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20230626</creationdate><title>On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation</title><author>Nguyen-Tang, Thanh ; Yin, Ming ; Gupta, Sunil ; Venkatesh, Svetha ; Arora, Raman</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c168t-c344eb4e44068b8a4d5240c72b65ccae32f9616b49620f0015037061e3e6a5c63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Nguyen-Tang, Thanh</creatorcontrib><creatorcontrib>Yin, Ming</creatorcontrib><creatorcontrib>Gupta, Sunil</creatorcontrib><creatorcontrib>Venkatesh, Svetha</creatorcontrib><creatorcontrib>Arora, Raman</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of the ... AAAI Conference on Artificial Intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nguyen-Tang, Thanh</au><au>Yin, Ming</au><au>Gupta, Sunil</au><au>Venkatesh, Svetha</au><au>Arora, Raman</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation</atitle><jtitle>Proceedings of the ... AAAI Conference on Artificial Intelligence</jtitle><date>2023-06-26</date><risdate>2023</risdate><volume>37</volume><issue>8</issue><spage>9310</spage><epage>9318</epage><pages>9310-9318</pages><issn>2159-5399</issn><eissn>2374-3468</eissn><abstract>Sample-efficient offline reinforcement learning (RL) with linear function approximation has been studied extensively recently. Much of the prior work has yielded instance-independent rates that hold even for the worst-case realization of problem instances. This work seeks to understand instance-dependent bounds for offline RL with linear function approximation. We present an algorithm called Bootstrapped and Constrained Pessimistic Value Iteration (BCP-VI), which leverages data bootstrapping and constrained optimization on top of pessimism. We show that under a partial data coverage assumption, that of concentrability with respect to an optimal policy, the proposed algorithm yields a fast rate for offline RL when there is a positive gap in the optimal Q-value functions, even if the offline data were collected adaptively. Moreover, when the linear features of the optimal actions in the states reachable by an optimal policy span those reachable by the behavior policy and the optimal actions are unique, offline RL achieves absolute zero sub-optimality error when the number of episodes exceeds a (finite) instance-dependent threshold. To the best of our knowledge, these are the first results that give a fast rate bound on the sub-optimality and an absolute zero sub-optimality bound for offline RL with linear function approximation from adaptive data with partial coverage. We also provide instance-agnostic and instance-dependent information-theoretical lower bounds to complement our upper bounds.</abstract><doi>10.1609/aaai.v37i8.26116</doi><tpages>9</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2159-5399 |
ispartof | Proceedings of the ... AAAI Conference on Artificial Intelligence, 2023-06, Vol.37 (8), p.9310-9318 |
issn | 2159-5399 2374-3468 |
language | eng |
recordid | cdi_crossref_primary_10_1609_aaai_v37i8_26116 |
source | Freely Accessible Science Journals - check A-Z of ejournals |
title | On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T16%3A11%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20Instance-Dependent%20Bounds%20for%20Offline%20Reinforcement%20Learning%20with%20Linear%20Function%20Approximation&rft.jtitle=Proceedings%20of%20the%20...%20AAAI%20Conference%20on%20Artificial%20Intelligence&rft.au=Nguyen-Tang,%20Thanh&rft.date=2023-06-26&rft.volume=37&rft.issue=8&rft.spage=9310&rft.epage=9318&rft.pages=9310-9318&rft.issn=2159-5399&rft.eissn=2374-3468&rft_id=info:doi/10.1609/aaai.v37i8.26116&rft_dat=%3Ccrossref%3E10_1609_aaai_v37i8_26116%3C/crossref%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c168t-c344eb4e44068b8a4d5240c72b65ccae32f9616b49620f0015037061e3e6a5c63%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |