Loading…

More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling

Thompson sampling (TS) is one of the most popular exploration techniques in reinforcement learning (RL). However, most TS algorithms with theoretical guarantees are difficult to implement and not generalizable to Deep RL. While the emerging approximate sampling-based exploration schemes are promisin...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2024-06
Main Authors:	Haque Ishfaq, Tan, Yixin, Yang, Yu, Lan, Qingfeng, Lu, Jianfeng, A Rupam Mahmood, Precup, Doina, Pan, Xu
Format:	Article
Language:	English
Subjects:	Algorithms Decision theory Machine learning Markov processes Samplers Sampling methods Task complexity
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Haque Ishfaq Tan, Yixin Yang, Yu Lan, Qingfeng Lu, Jianfeng A Rupam Mahmood Precup, Doina Pan, Xu
description	Thompson sampling (TS) is one of the most popular exploration techniques in reinforcement learning (RL). However, most TS algorithms with theoretical guarantees are difficult to implement and not generalizable to Deep RL. While the emerging approximate sampling-based exploration schemes are promising, most existing algorithms are specific to linear Markov Decision Processes (MDP) with suboptimal regret bounds, or only use the most basic samplers such as Langevin Monte Carlo. In this work, we propose an algorithmic framework that incorporates different approximate sampling methods with the recently proposed Feel-Good Thompson Sampling (FGTS) approach (Zhang, 2022; Dann et al., 2021), which was previously known to be computationally intractable in general. When applied to linear MDPs, our regret analysis yields the best known dependency of regret on dimensionality, surpassing existing randomized algorithms. Additionally, we provide explicit sampling complexity for each employed sampler. Empirically, we show that in tasks where deep exploration is necessary, our proposed algorithms that combine FGTS and approximate sampling perform significantly better compared to other strong baselines. On several challenging games from the Atari 57 suite, our algorithms achieve performance that is either better than or on par with other strong baselines from the deep RL literature.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3069654338</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3069654338</sourcerecordid><originalsourceid>FETCH-proquest_journals_30696543383</originalsourceid><addsrcrecordid>eNqNysEKgjAcgPERBEn5DoPOwtrU7BhhdKiLdekkQ_-LidtsmyE9fQt6gE7f4fvNUEQZ2yRFSukCxc51hBCab2mWsQjdL8YCLoWQjQTtccV1a5R8Q4vLaeiN5V4ajYWxuAKpQxtQX3gGbrXUD_ySHO-HwZpJKu4BX7ka-jBWaC547yD-dYnWx_J2OCWBPkdwvu7MaHVYNSP5Ls9Sxgr2n_oAKwFC6w</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3069654338</pqid></control><display><type>article</type><title>More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling</title><source>Publicly Available Content Database</source><creator>Haque Ishfaq ; Tan, Yixin ; Yang, Yu ; Lan, Qingfeng ; Lu, Jianfeng ; A Rupam Mahmood ; Precup, Doina ; Pan, Xu</creator><creatorcontrib>Haque Ishfaq ; Tan, Yixin ; Yang, Yu ; Lan, Qingfeng ; Lu, Jianfeng ; A Rupam Mahmood ; Precup, Doina ; Pan, Xu</creatorcontrib><description>Thompson sampling (TS) is one of the most popular exploration techniques in reinforcement learning (RL). However, most TS algorithms with theoretical guarantees are difficult to implement and not generalizable to Deep RL. While the emerging approximate sampling-based exploration schemes are promising, most existing algorithms are specific to linear Markov Decision Processes (MDP) with suboptimal regret bounds, or only use the most basic samplers such as Langevin Monte Carlo. In this work, we propose an algorithmic framework that incorporates different approximate sampling methods with the recently proposed Feel-Good Thompson Sampling (FGTS) approach (Zhang, 2022; Dann et al., 2021), which was previously known to be computationally intractable in general. When applied to linear MDPs, our regret analysis yields the best known dependency of regret on dimensionality, surpassing existing randomized algorithms. Additionally, we provide explicit sampling complexity for each employed sampler. Empirically, we show that in tasks where deep exploration is necessary, our proposed algorithms that combine FGTS and approximate sampling perform significantly better compared to other strong baselines. On several challenging games from the Atari 57 suite, our algorithms achieve performance that is either better than or on par with other strong baselines from the deep RL literature.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Decision theory ; Machine learning ; Markov processes ; Samplers ; Sampling methods ; Task complexity</subject><ispartof>arXiv.org, 2024-06</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3069654338?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>776,780,25731,36989,44566</link.rule.ids></links><search><creatorcontrib>Haque Ishfaq</creatorcontrib><creatorcontrib>Tan, Yixin</creatorcontrib><creatorcontrib>Yang, Yu</creatorcontrib><creatorcontrib>Lan, Qingfeng</creatorcontrib><creatorcontrib>Lu, Jianfeng</creatorcontrib><creatorcontrib>A Rupam Mahmood</creatorcontrib><creatorcontrib>Precup, Doina</creatorcontrib><creatorcontrib>Pan, Xu</creatorcontrib><title>More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling</title><title>arXiv.org</title><description>Thompson sampling (TS) is one of the most popular exploration techniques in reinforcement learning (RL). However, most TS algorithms with theoretical guarantees are difficult to implement and not generalizable to Deep RL. While the emerging approximate sampling-based exploration schemes are promising, most existing algorithms are specific to linear Markov Decision Processes (MDP) with suboptimal regret bounds, or only use the most basic samplers such as Langevin Monte Carlo. In this work, we propose an algorithmic framework that incorporates different approximate sampling methods with the recently proposed Feel-Good Thompson Sampling (FGTS) approach (Zhang, 2022; Dann et al., 2021), which was previously known to be computationally intractable in general. When applied to linear MDPs, our regret analysis yields the best known dependency of regret on dimensionality, surpassing existing randomized algorithms. Additionally, we provide explicit sampling complexity for each employed sampler. Empirically, we show that in tasks where deep exploration is necessary, our proposed algorithms that combine FGTS and approximate sampling perform significantly better compared to other strong baselines. On several challenging games from the Atari 57 suite, our algorithms achieve performance that is either better than or on par with other strong baselines from the deep RL literature.</description><subject>Algorithms</subject><subject>Decision theory</subject><subject>Machine learning</subject><subject>Markov processes</subject><subject>Samplers</subject><subject>Sampling methods</subject><subject>Task complexity</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNysEKgjAcgPERBEn5DoPOwtrU7BhhdKiLdekkQ_-LidtsmyE9fQt6gE7f4fvNUEQZ2yRFSukCxc51hBCab2mWsQjdL8YCLoWQjQTtccV1a5R8Q4vLaeiN5V4ajYWxuAKpQxtQX3gGbrXUD_ySHO-HwZpJKu4BX7ka-jBWaC547yD-dYnWx_J2OCWBPkdwvu7MaHVYNSP5Ls9Sxgr2n_oAKwFC6w</recordid><startdate>20240618</startdate><enddate>20240618</enddate><creator>Haque Ishfaq</creator><creator>Tan, Yixin</creator><creator>Yang, Yu</creator><creator>Lan, Qingfeng</creator><creator>Lu, Jianfeng</creator><creator>A Rupam Mahmood</creator><creator>Precup, Doina</creator><creator>Pan, Xu</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240618</creationdate><title>More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling</title><author>Haque Ishfaq ; Tan, Yixin ; Yang, Yu ; Lan, Qingfeng ; Lu, Jianfeng ; A Rupam Mahmood ; Precup, Doina ; Pan, Xu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30696543383</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Decision theory</topic><topic>Machine learning</topic><topic>Markov processes</topic><topic>Samplers</topic><topic>Sampling methods</topic><topic>Task complexity</topic><toplevel>online_resources</toplevel><creatorcontrib>Haque Ishfaq</creatorcontrib><creatorcontrib>Tan, Yixin</creatorcontrib><creatorcontrib>Yang, Yu</creatorcontrib><creatorcontrib>Lan, Qingfeng</creatorcontrib><creatorcontrib>Lu, Jianfeng</creatorcontrib><creatorcontrib>A Rupam Mahmood</creatorcontrib><creatorcontrib>Precup, Doina</creatorcontrib><creatorcontrib>Pan, Xu</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Haque Ishfaq</au><au>Tan, Yixin</au><au>Yang, Yu</au><au>Lan, Qingfeng</au><au>Lu, Jianfeng</au><au>A Rupam Mahmood</au><au>Precup, Doina</au><au>Pan, Xu</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling</atitle><jtitle>arXiv.org</jtitle><date>2024-06-18</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Thompson sampling (TS) is one of the most popular exploration techniques in reinforcement learning (RL). However, most TS algorithms with theoretical guarantees are difficult to implement and not generalizable to Deep RL. While the emerging approximate sampling-based exploration schemes are promising, most existing algorithms are specific to linear Markov Decision Processes (MDP) with suboptimal regret bounds, or only use the most basic samplers such as Langevin Monte Carlo. In this work, we propose an algorithmic framework that incorporates different approximate sampling methods with the recently proposed Feel-Good Thompson Sampling (FGTS) approach (Zhang, 2022; Dann et al., 2021), which was previously known to be computationally intractable in general. When applied to linear MDPs, our regret analysis yields the best known dependency of regret on dimensionality, surpassing existing randomized algorithms. Additionally, we provide explicit sampling complexity for each employed sampler. Empirically, we show that in tasks where deep exploration is necessary, our proposed algorithms that combine FGTS and approximate sampling perform significantly better compared to other strong baselines. On several challenging games from the Atari 57 suite, our algorithms achieve performance that is either better than or on par with other strong baselines from the deep RL literature.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-06
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_3069654338
source	Publicly Available Content Database
subjects	Algorithms Decision theory Machine learning Markov processes Samplers Sampling methods Task complexity
title	More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T17%3A46%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=More%20Efficient%20Randomized%20Exploration%20for%20Reinforcement%20Learning%20via%20Approximate%20Sampling&rft.jtitle=arXiv.org&rft.au=Haque%20Ishfaq&rft.date=2024-06-18&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3069654338%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_30696543383%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3069654338&rft_id=info:pmid/&rfr_iscdi=true