Loading…

On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation

Sample-efficient offline reinforcement learning (RL) with linear function approximation has been studied extensively recently. Much of the prior work has yielded instance-independent rates that hold even for the worst-case realization of problem instances. This work seeks to understand instance-depe...

Full description

Saved in:

Bibliographic Details
Published in:	Proceedings of the ... AAAI Conference on Artificial Intelligence 2023-06, Vol.37 (8), p.9310-9318
Main Authors:	Nguyen-Tang, Thanh, Yin, Ming, Gupta, Sunil, Venkatesh, Svetha, Arora, Raman
Format:	Article
Language:	English
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	9318
container_issue	8
container_start_page	9310
container_title	Proceedings of the ... AAAI Conference on Artificial Intelligence
container_volume	37
creator	Nguyen-Tang, Thanh Yin, Ming Gupta, Sunil Venkatesh, Svetha Arora, Raman
description	Sample-efficient offline reinforcement learning (RL) with linear function approximation has been studied extensively recently. Much of the prior work has yielded instance-independent rates that hold even for the worst-case realization of problem instances. This work seeks to understand instance-dependent bounds for offline RL with linear function approximation. We present an algorithm called Bootstrapped and Constrained Pessimistic Value Iteration (BCP-VI), which leverages data bootstrapping and constrained optimization on top of pessimism. We show that under a partial data coverage assumption, that of concentrability with respect to an optimal policy, the proposed algorithm yields a fast rate for offline RL when there is a positive gap in the optimal Q-value functions, even if the offline data were collected adaptively. Moreover, when the linear features of the optimal actions in the states reachable by an optimal policy span those reachable by the behavior policy and the optimal actions are unique, offline RL achieves absolute zero sub-optimality error when the number of episodes exceeds a (finite) instance-dependent threshold. To the best of our knowledge, these are the first results that give a fast rate bound on the sub-optimality and an absolute zero sub-optimality bound for offline RL with linear function approximation from adaptive data with partial coverage. We also provide instance-agnostic and instance-dependent information-theoretical lower bounds to complement our upper bounds.
doi_str_mv	10.1609/aaai.v37i8.26116
format	article
fullrecord	<record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1609_aaai_v37i8_26116</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1609_aaai_v37i8_26116</sourcerecordid><originalsourceid>FETCH-LOGICAL-c168t-c344eb4e44068b8a4d5240c72b65ccae32f9616b49620f0015037061e3e6a5c63</originalsourceid><addsrcrecordid>eNotkF9LwzAUxYMoOHTvPuYLdOY26W3zOKfTQWEg-lzS7EYjW1qSzj_f3nZ6X849cDgcfozdgFgACn1rjPGLT1n6apEjAJ6xWS5LlUmF1fn4Q6GzQmp9yeYpfYjxlAaAcsbabeCbkAYTLGX31FPYURj4XXcMu8RdF_nWub0PxJ_Jh9FbOkyBmkwMPrzxLz-883oMmMjXx2AH3wW-7PvYffuDmdw1u3Bmn2j-r1fsdf3wsnrK6u3jZrWsMwtYDZmVSlGrSCmBVVsZtStyJWyZt1hYa0jmTiNgqzTmwgkBhZClQCBJaAqL8oqJv14bu5QiuaaP44T404BoJkzNhKk5YWpOmOQvDKpdEQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation</title><source>Freely Accessible Science Journals - check A-Z of ejournals</source><creator>Nguyen-Tang, Thanh ; Yin, Ming ; Gupta, Sunil ; Venkatesh, Svetha ; Arora, Raman</creator><creatorcontrib>Nguyen-Tang, Thanh ; Yin, Ming ; Gupta, Sunil ; Venkatesh, Svetha ; Arora, Raman</creatorcontrib><description>Sample-efficient offline reinforcement learning (RL) with linear function approximation has been studied extensively recently. Much of the prior work has yielded instance-independent rates that hold even for the worst-case realization of problem instances. This work seeks to understand instance-dependent bounds for offline RL with linear function approximation. We present an algorithm called Bootstrapped and Constrained Pessimistic Value Iteration (BCP-VI), which leverages data bootstrapping and constrained optimization on top of pessimism. We show that under a partial data coverage assumption, that of concentrability with respect to an optimal policy, the proposed algorithm yields a fast rate for offline RL when there is a positive gap in the optimal Q-value functions, even if the offline data were collected adaptively. Moreover, when the linear features of the optimal actions in the states reachable by an optimal policy span those reachable by the behavior policy and the optimal actions are unique, offline RL achieves absolute zero sub-optimality error when the number of episodes exceeds a (finite) instance-dependent threshold. To the best of our knowledge, these are the first results that give a fast rate bound on the sub-optimality and an absolute zero sub-optimality bound for offline RL with linear function approximation from adaptive data with partial coverage. We also provide instance-agnostic and instance-dependent information-theoretical lower bounds to complement our upper bounds.</description><identifier>ISSN: 2159-5399</identifier><identifier>EISSN: 2374-3468</identifier><identifier>DOI: 10.1609/aaai.v37i8.26116</identifier><language>eng</language><ispartof>Proceedings of the ... AAAI Conference on Artificial Intelligence, 2023-06, Vol.37 (8), p.9310-9318</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Nguyen-Tang, Thanh</creatorcontrib><creatorcontrib>Yin, Ming</creatorcontrib><creatorcontrib>Gupta, Sunil</creatorcontrib><creatorcontrib>Venkatesh, Svetha</creatorcontrib><creatorcontrib>Arora, Raman</creatorcontrib><title>On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation</title><title>Proceedings of the ... AAAI Conference on Artificial Intelligence</title><description>Sample-efficient offline reinforcement learning (RL) with linear function approximation has been studied extensively recently. Much of the prior work has yielded instance-independent rates that hold even for the worst-case realization of problem instances. This work seeks to understand instance-dependent bounds for offline RL with linear function approximation. We present an algorithm called Bootstrapped and Constrained Pessimistic Value Iteration (BCP-VI), which leverages data bootstrapping and constrained optimization on top of pessimism. We show that under a partial data coverage assumption, that of concentrability with respect to an optimal policy, the proposed algorithm yields a fast rate for offline RL when there is a positive gap in the optimal Q-value functions, even if the offline data were collected adaptively. Moreover, when the linear features of the optimal actions in the states reachable by an optimal policy span those reachable by the behavior policy and the optimal actions are unique, offline RL achieves absolute zero sub-optimality error when the number of episodes exceeds a (finite) instance-dependent threshold. To the best of our knowledge, these are the first results that give a fast rate bound on the sub-optimality and an absolute zero sub-optimality bound for offline RL with linear function approximation from adaptive data with partial coverage. We also provide instance-agnostic and instance-dependent information-theoretical lower bounds to complement our upper bounds.</description><issn>2159-5399</issn><issn>2374-3468</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNotkF9LwzAUxYMoOHTvPuYLdOY26W3zOKfTQWEg-lzS7EYjW1qSzj_f3nZ6X849cDgcfozdgFgACn1rjPGLT1n6apEjAJ6xWS5LlUmF1fn4Q6GzQmp9yeYpfYjxlAaAcsbabeCbkAYTLGX31FPYURj4XXcMu8RdF_nWub0PxJ_Jh9FbOkyBmkwMPrzxLz-883oMmMjXx2AH3wW-7PvYffuDmdw1u3Bmn2j-r1fsdf3wsnrK6u3jZrWsMwtYDZmVSlGrSCmBVVsZtStyJWyZt1hYa0jmTiNgqzTmwgkBhZClQCBJaAqL8oqJv14bu5QiuaaP44T404BoJkzNhKk5YWpOmOQvDKpdEQ</recordid><startdate>20230626</startdate><enddate>20230626</enddate><creator>Nguyen-Tang, Thanh</creator><creator>Yin, Ming</creator><creator>Gupta, Sunil</creator><creator>Venkatesh, Svetha</creator><creator>Arora, Raman</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20230626</creationdate><title>On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation</title><author>Nguyen-Tang, Thanh ; Yin, Ming ; Gupta, Sunil ; Venkatesh, Svetha ; Arora, Raman</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c168t-c344eb4e44068b8a4d5240c72b65ccae32f9616b49620f0015037061e3e6a5c63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Nguyen-Tang, Thanh</creatorcontrib><creatorcontrib>Yin, Ming</creatorcontrib><creatorcontrib>Gupta, Sunil</creatorcontrib><creatorcontrib>Venkatesh, Svetha</creatorcontrib><creatorcontrib>Arora, Raman</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of the ... AAAI Conference on Artificial Intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nguyen-Tang, Thanh</au><au>Yin, Ming</au><au>Gupta, Sunil</au><au>Venkatesh, Svetha</au><au>Arora, Raman</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation</atitle><jtitle>Proceedings of the ... AAAI Conference on Artificial Intelligence</jtitle><date>2023-06-26</date><risdate>2023</risdate><volume>37</volume><issue>8</issue><spage>9310</spage><epage>9318</epage><pages>9310-9318</pages><issn>2159-5399</issn><eissn>2374-3468</eissn><abstract>Sample-efficient offline reinforcement learning (RL) with linear function approximation has been studied extensively recently. Much of the prior work has yielded instance-independent rates that hold even for the worst-case realization of problem instances. This work seeks to understand instance-dependent bounds for offline RL with linear function approximation. We present an algorithm called Bootstrapped and Constrained Pessimistic Value Iteration (BCP-VI), which leverages data bootstrapping and constrained optimization on top of pessimism. We show that under a partial data coverage assumption, that of concentrability with respect to an optimal policy, the proposed algorithm yields a fast rate for offline RL when there is a positive gap in the optimal Q-value functions, even if the offline data were collected adaptively. Moreover, when the linear features of the optimal actions in the states reachable by an optimal policy span those reachable by the behavior policy and the optimal actions are unique, offline RL achieves absolute zero sub-optimality error when the number of episodes exceeds a (finite) instance-dependent threshold. To the best of our knowledge, these are the first results that give a fast rate bound on the sub-optimality and an absolute zero sub-optimality bound for offline RL with linear function approximation from adaptive data with partial coverage. We also provide instance-agnostic and instance-dependent information-theoretical lower bounds to complement our upper bounds.</abstract><doi>10.1609/aaai.v37i8.26116</doi><tpages>9</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2159-5399
ispartof	Proceedings of the ... AAAI Conference on Artificial Intelligence, 2023-06, Vol.37 (8), p.9310-9318
issn	2159-5399 2374-3468
language	eng
recordid	cdi_crossref_primary_10_1609_aaai_v37i8_26116
source	Freely Accessible Science Journals - check A-Z of ejournals
title	On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T16%3A11%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20Instance-Dependent%20Bounds%20for%20Offline%20Reinforcement%20Learning%20with%20Linear%20Function%20Approximation&rft.jtitle=Proceedings%20of%20the%20...%20AAAI%20Conference%20on%20Artificial%20Intelligence&rft.au=Nguyen-Tang,%20Thanh&rft.date=2023-06-26&rft.volume=37&rft.issue=8&rft.spage=9310&rft.epage=9318&rft.pages=9310-9318&rft.issn=2159-5399&rft.eissn=2374-3468&rft_id=info:doi/10.1609/aaai.v37i8.26116&rft_dat=%3Ccrossref%3E10_1609_aaai_v37i8_26116%3C/crossref%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c168t-c344eb4e44068b8a4d5240c72b65ccae32f9616b49620f0015037061e3e6a5c63%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true