Loading…

On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation

Sample-efficient offline reinforcement learning (RL) with linear function approximation has been studied extensively recently. Much of the prior work has yielded instance-independent rates that hold even for the worst-case realization of problem instances. This work seeks to understand instance-depe...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the ... AAAI Conference on Artificial Intelligence 2023-06, Vol.37 (8), p.9310-9318
Main Authors: Nguyen-Tang, Thanh, Yin, Ming, Gupta, Sunil, Venkatesh, Svetha, Arora, Raman
Format: Article
Language:English
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 9318
container_issue 8
container_start_page 9310
container_title Proceedings of the ... AAAI Conference on Artificial Intelligence
container_volume 37
creator Nguyen-Tang, Thanh
Yin, Ming
Gupta, Sunil
Venkatesh, Svetha
Arora, Raman
description Sample-efficient offline reinforcement learning (RL) with linear function approximation has been studied extensively recently. Much of the prior work has yielded instance-independent rates that hold even for the worst-case realization of problem instances. This work seeks to understand instance-dependent bounds for offline RL with linear function approximation. We present an algorithm called Bootstrapped and Constrained Pessimistic Value Iteration (BCP-VI), which leverages data bootstrapping and constrained optimization on top of pessimism. We show that under a partial data coverage assumption, that of concentrability with respect to an optimal policy, the proposed algorithm yields a fast rate for offline RL when there is a positive gap in the optimal Q-value functions, even if the offline data were collected adaptively. Moreover, when the linear features of the optimal actions in the states reachable by an optimal policy span those reachable by the behavior policy and the optimal actions are unique, offline RL achieves absolute zero sub-optimality error when the number of episodes exceeds a (finite) instance-dependent threshold. To the best of our knowledge, these are the first results that give a fast rate bound on the sub-optimality and an absolute zero sub-optimality bound for offline RL with linear function approximation from adaptive data with partial coverage. We also provide instance-agnostic and instance-dependent information-theoretical lower bounds to complement our upper bounds.
doi_str_mv 10.1609/aaai.v37i8.26116
format article
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1609_aaai_v37i8_26116</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1609_aaai_v37i8_26116</sourcerecordid><originalsourceid>FETCH-LOGICAL-c168t-c344eb4e44068b8a4d5240c72b65ccae32f9616b49620f0015037061e3e6a5c63</originalsourceid><addsrcrecordid>eNotkF9LwzAUxYMoOHTvPuYLdOY26W3zOKfTQWEg-lzS7EYjW1qSzj_f3nZ6X849cDgcfozdgFgACn1rjPGLT1n6apEjAJ6xWS5LlUmF1fn4Q6GzQmp9yeYpfYjxlAaAcsbabeCbkAYTLGX31FPYURj4XXcMu8RdF_nWub0PxJ_Jh9FbOkyBmkwMPrzxLz-883oMmMjXx2AH3wW-7PvYffuDmdw1u3Bmn2j-r1fsdf3wsnrK6u3jZrWsMwtYDZmVSlGrSCmBVVsZtStyJWyZt1hYa0jmTiNgqzTmwgkBhZClQCBJaAqL8oqJv14bu5QiuaaP44T404BoJkzNhKk5YWpOmOQvDKpdEQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation</title><source>Freely Accessible Science Journals - check A-Z of ejournals</source><creator>Nguyen-Tang, Thanh ; Yin, Ming ; Gupta, Sunil ; Venkatesh, Svetha ; Arora, Raman</creator><creatorcontrib>Nguyen-Tang, Thanh ; Yin, Ming ; Gupta, Sunil ; Venkatesh, Svetha ; Arora, Raman</creatorcontrib><description>Sample-efficient offline reinforcement learning (RL) with linear function approximation has been studied extensively recently. Much of the prior work has yielded instance-independent rates that hold even for the worst-case realization of problem instances. This work seeks to understand instance-dependent bounds for offline RL with linear function approximation. We present an algorithm called Bootstrapped and Constrained Pessimistic Value Iteration (BCP-VI), which leverages data bootstrapping and constrained optimization on top of pessimism. We show that under a partial data coverage assumption, that of concentrability with respect to an optimal policy, the proposed algorithm yields a fast rate for offline RL when there is a positive gap in the optimal Q-value functions, even if the offline data were collected adaptively. Moreover, when the linear features of the optimal actions in the states reachable by an optimal policy span those reachable by the behavior policy and the optimal actions are unique, offline RL achieves absolute zero sub-optimality error when the number of episodes exceeds a (finite) instance-dependent threshold. To the best of our knowledge, these are the first results that give a fast rate bound on the sub-optimality and an absolute zero sub-optimality bound for offline RL with linear function approximation from adaptive data with partial coverage. We also provide instance-agnostic and instance-dependent information-theoretical lower bounds to complement our upper bounds.</description><identifier>ISSN: 2159-5399</identifier><identifier>EISSN: 2374-3468</identifier><identifier>DOI: 10.1609/aaai.v37i8.26116</identifier><language>eng</language><ispartof>Proceedings of the ... AAAI Conference on Artificial Intelligence, 2023-06, Vol.37 (8), p.9310-9318</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Nguyen-Tang, Thanh</creatorcontrib><creatorcontrib>Yin, Ming</creatorcontrib><creatorcontrib>Gupta, Sunil</creatorcontrib><creatorcontrib>Venkatesh, Svetha</creatorcontrib><creatorcontrib>Arora, Raman</creatorcontrib><title>On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation</title><title>Proceedings of the ... AAAI Conference on Artificial Intelligence</title><description>Sample-efficient offline reinforcement learning (RL) with linear function approximation has been studied extensively recently. Much of the prior work has yielded instance-independent rates that hold even for the worst-case realization of problem instances. This work seeks to understand instance-dependent bounds for offline RL with linear function approximation. We present an algorithm called Bootstrapped and Constrained Pessimistic Value Iteration (BCP-VI), which leverages data bootstrapping and constrained optimization on top of pessimism. We show that under a partial data coverage assumption, that of concentrability with respect to an optimal policy, the proposed algorithm yields a fast rate for offline RL when there is a positive gap in the optimal Q-value functions, even if the offline data were collected adaptively. Moreover, when the linear features of the optimal actions in the states reachable by an optimal policy span those reachable by the behavior policy and the optimal actions are unique, offline RL achieves absolute zero sub-optimality error when the number of episodes exceeds a (finite) instance-dependent threshold. To the best of our knowledge, these are the first results that give a fast rate bound on the sub-optimality and an absolute zero sub-optimality bound for offline RL with linear function approximation from adaptive data with partial coverage. We also provide instance-agnostic and instance-dependent information-theoretical lower bounds to complement our upper bounds.</description><issn>2159-5399</issn><issn>2374-3468</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNotkF9LwzAUxYMoOHTvPuYLdOY26W3zOKfTQWEg-lzS7EYjW1qSzj_f3nZ6X849cDgcfozdgFgACn1rjPGLT1n6apEjAJ6xWS5LlUmF1fn4Q6GzQmp9yeYpfYjxlAaAcsbabeCbkAYTLGX31FPYURj4XXcMu8RdF_nWub0PxJ_Jh9FbOkyBmkwMPrzxLz-883oMmMjXx2AH3wW-7PvYffuDmdw1u3Bmn2j-r1fsdf3wsnrK6u3jZrWsMwtYDZmVSlGrSCmBVVsZtStyJWyZt1hYa0jmTiNgqzTmwgkBhZClQCBJaAqL8oqJv14bu5QiuaaP44T404BoJkzNhKk5YWpOmOQvDKpdEQ</recordid><startdate>20230626</startdate><enddate>20230626</enddate><creator>Nguyen-Tang, Thanh</creator><creator>Yin, Ming</creator><creator>Gupta, Sunil</creator><creator>Venkatesh, Svetha</creator><creator>Arora, Raman</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20230626</creationdate><title>On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation</title><author>Nguyen-Tang, Thanh ; Yin, Ming ; Gupta, Sunil ; Venkatesh, Svetha ; Arora, Raman</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c168t-c344eb4e44068b8a4d5240c72b65ccae32f9616b49620f0015037061e3e6a5c63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Nguyen-Tang, Thanh</creatorcontrib><creatorcontrib>Yin, Ming</creatorcontrib><creatorcontrib>Gupta, Sunil</creatorcontrib><creatorcontrib>Venkatesh, Svetha</creatorcontrib><creatorcontrib>Arora, Raman</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of the ... AAAI Conference on Artificial Intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nguyen-Tang, Thanh</au><au>Yin, Ming</au><au>Gupta, Sunil</au><au>Venkatesh, Svetha</au><au>Arora, Raman</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation</atitle><jtitle>Proceedings of the ... AAAI Conference on Artificial Intelligence</jtitle><date>2023-06-26</date><risdate>2023</risdate><volume>37</volume><issue>8</issue><spage>9310</spage><epage>9318</epage><pages>9310-9318</pages><issn>2159-5399</issn><eissn>2374-3468</eissn><abstract>Sample-efficient offline reinforcement learning (RL) with linear function approximation has been studied extensively recently. Much of the prior work has yielded instance-independent rates that hold even for the worst-case realization of problem instances. This work seeks to understand instance-dependent bounds for offline RL with linear function approximation. We present an algorithm called Bootstrapped and Constrained Pessimistic Value Iteration (BCP-VI), which leverages data bootstrapping and constrained optimization on top of pessimism. We show that under a partial data coverage assumption, that of concentrability with respect to an optimal policy, the proposed algorithm yields a fast rate for offline RL when there is a positive gap in the optimal Q-value functions, even if the offline data were collected adaptively. Moreover, when the linear features of the optimal actions in the states reachable by an optimal policy span those reachable by the behavior policy and the optimal actions are unique, offline RL achieves absolute zero sub-optimality error when the number of episodes exceeds a (finite) instance-dependent threshold. To the best of our knowledge, these are the first results that give a fast rate bound on the sub-optimality and an absolute zero sub-optimality bound for offline RL with linear function approximation from adaptive data with partial coverage. We also provide instance-agnostic and instance-dependent information-theoretical lower bounds to complement our upper bounds.</abstract><doi>10.1609/aaai.v37i8.26116</doi><tpages>9</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2159-5399
ispartof Proceedings of the ... AAAI Conference on Artificial Intelligence, 2023-06, Vol.37 (8), p.9310-9318
issn 2159-5399
2374-3468
language eng
recordid cdi_crossref_primary_10_1609_aaai_v37i8_26116
source Freely Accessible Science Journals - check A-Z of ejournals
title On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T16%3A11%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20Instance-Dependent%20Bounds%20for%20Offline%20Reinforcement%20Learning%20with%20Linear%20Function%20Approximation&rft.jtitle=Proceedings%20of%20the%20...%20AAAI%20Conference%20on%20Artificial%20Intelligence&rft.au=Nguyen-Tang,%20Thanh&rft.date=2023-06-26&rft.volume=37&rft.issue=8&rft.spage=9310&rft.epage=9318&rft.pages=9310-9318&rft.issn=2159-5399&rft.eissn=2374-3468&rft_id=info:doi/10.1609/aaai.v37i8.26116&rft_dat=%3Ccrossref%3E10_1609_aaai_v37i8_26116%3C/crossref%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c168t-c344eb4e44068b8a4d5240c72b65ccae32f9616b49620f0015037061e3e6a5c63%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true