Loading…

Distributional Reinforcement Learning via Moment Matching

We consider the problem of learning a set of probability distributions from the empirical Bellman dynamics in distributional reinforcement learning (RL), a class of state-of-the-art methods that estimate the distribution, as opposed to only the expectation, of the total return. We formulate a method...

Full description

Saved in:

Bibliographic Details
Main Authors:	Nguyen-Tang, Thanh, Gupta, Sunil, Venkatesh, Svetha
Format:	Conference Proceeding
Language:	English
Citations:	Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c175t-bd878c4d6dc049e8170d9179d968ac4cf9cc17f4d1075a79d66d664f1c5cc4323
cites
container_end_page	9152
container_issue	10
container_start_page	9144
container_title
container_volume	35
creator	Nguyen-Tang, Thanh Gupta, Sunil Venkatesh, Svetha
description	We consider the problem of learning a set of probability distributions from the empirical Bellman dynamics in distributional reinforcement learning (RL), a class of state-of-the-art methods that estimate the distribution, as opposed to only the expectation, of the total return. We formulate a method that learns a finite set of statistics from each return distribution via neural networks, as in the distributional RL literature. Existing distributional RL methods however constrain the learned statistics to predefined functional forms of the return distribution which is both restrictive in representation and difficult in maintaining the predefined statistics. Instead, we learn unrestricted statistics, i.e., deterministic (pseudo-)samples, of the return distribution by leveraging a technique from hypothesis testing known as maximum mean discrepancy (MMD), which leads to a simpler objective amenable to backpropagation. Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target. We establish sufficient conditions for the contraction of the distributional Bellman operator and provide finite-sample analysis for the deterministic samples in distribution approximation. Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines and sets a new record in the Atari games for non-distributed agents.
doi_str_mv	10.1609/aaai.v35i10.17104
format	conference_proceeding
fullrecord	<record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1609_aaai_v35i10_17104</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1609_aaai_v35i10_17104</sourcerecordid><originalsourceid>FETCH-LOGICAL-c175t-bd878c4d6dc049e8170d9179d968ac4cf9cc17f4d1075a79d66d664f1c5cc4323</originalsourceid><addsrcrecordid>eNotj9tKAzEQhoMoWGofwLt9ga2Z5rS5lHqELYLodUgniUbaXUliwbc32zoMzM_HMMNHyDXQJUiqb6y1cXlgIk5AAeVnZLZiireMy-68ZhC6FUzrS7LI-YvW4hoA1Izou5hLitufEsfB7ppXH4cwJvR7P5Sm9zYNcfhoDtE2m_HINrbgZ2VX5CLYXfaL_zkn7w_3b-untn95fF7f9i2CEqXduk51yJ10WJ_6DhR1GpR2WnYWOQaNdTFwB1QJW7mUtXkAFIicrdicwOkupjHn5IP5TnFv068BaiZ9M-mbk7456rM_A8ZPPQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Distributional Reinforcement Learning via Moment Matching</title><source>Freely Accessible Science Journals</source><creator>Nguyen-Tang, Thanh ; Gupta, Sunil ; Venkatesh, Svetha</creator><creatorcontrib>Nguyen-Tang, Thanh ; Gupta, Sunil ; Venkatesh, Svetha</creatorcontrib><description>We consider the problem of learning a set of probability distributions from the empirical Bellman dynamics in distributional reinforcement learning (RL), a class of state-of-the-art methods that estimate the distribution, as opposed to only the expectation, of the total return. We formulate a method that learns a finite set of statistics from each return distribution via neural networks, as in the distributional RL literature. Existing distributional RL methods however constrain the learned statistics to predefined functional forms of the return distribution which is both restrictive in representation and difficult in maintaining the predefined statistics. Instead, we learn unrestricted statistics, i.e., deterministic (pseudo-)samples, of the return distribution by leveraging a technique from hypothesis testing known as maximum mean discrepancy (MMD), which leads to a simpler objective amenable to backpropagation. Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target. We establish sufficient conditions for the contraction of the distributional Bellman operator and provide finite-sample analysis for the deterministic samples in distribution approximation. Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines and sets a new record in the Atari games for non-distributed agents.</description><identifier>ISSN: 2159-5399</identifier><identifier>EISSN: 2374-3468</identifier><identifier>DOI: 10.1609/aaai.v35i10.17104</identifier><language>eng</language><ispartof>Proceedings of the ... AAAI Conference on Artificial Intelligence, 2021, Vol.35 (10), p.9144-9152</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c175t-bd878c4d6dc049e8170d9179d968ac4cf9cc17f4d1075a79d66d664f1c5cc4323</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Nguyen-Tang, Thanh</creatorcontrib><creatorcontrib>Gupta, Sunil</creatorcontrib><creatorcontrib>Venkatesh, Svetha</creatorcontrib><title>Distributional Reinforcement Learning via Moment Matching</title><title>Proceedings of the ... AAAI Conference on Artificial Intelligence</title><description>We consider the problem of learning a set of probability distributions from the empirical Bellman dynamics in distributional reinforcement learning (RL), a class of state-of-the-art methods that estimate the distribution, as opposed to only the expectation, of the total return. We formulate a method that learns a finite set of statistics from each return distribution via neural networks, as in the distributional RL literature. Existing distributional RL methods however constrain the learned statistics to predefined functional forms of the return distribution which is both restrictive in representation and difficult in maintaining the predefined statistics. Instead, we learn unrestricted statistics, i.e., deterministic (pseudo-)samples, of the return distribution by leveraging a technique from hypothesis testing known as maximum mean discrepancy (MMD), which leads to a simpler objective amenable to backpropagation. Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target. We establish sufficient conditions for the contraction of the distributional Bellman operator and provide finite-sample analysis for the deterministic samples in distribution approximation. Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines and sets a new record in the Atari games for non-distributed agents.</description><issn>2159-5399</issn><issn>2374-3468</issn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2021</creationdate><recordtype>conference_proceeding</recordtype><recordid>eNotj9tKAzEQhoMoWGofwLt9ga2Z5rS5lHqELYLodUgniUbaXUliwbc32zoMzM_HMMNHyDXQJUiqb6y1cXlgIk5AAeVnZLZiireMy-68ZhC6FUzrS7LI-YvW4hoA1Izou5hLitufEsfB7ppXH4cwJvR7P5Sm9zYNcfhoDtE2m_HINrbgZ2VX5CLYXfaL_zkn7w_3b-untn95fF7f9i2CEqXduk51yJ10WJ_6DhR1GpR2WnYWOQaNdTFwB1QJW7mUtXkAFIicrdicwOkupjHn5IP5TnFv068BaiZ9M-mbk7456rM_A8ZPPQ</recordid><startdate>20210518</startdate><enddate>20210518</enddate><creator>Nguyen-Tang, Thanh</creator><creator>Gupta, Sunil</creator><creator>Venkatesh, Svetha</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20210518</creationdate><title>Distributional Reinforcement Learning via Moment Matching</title><author>Nguyen-Tang, Thanh ; Gupta, Sunil ; Venkatesh, Svetha</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c175t-bd878c4d6dc049e8170d9179d968ac4cf9cc17f4d1075a79d66d664f1c5cc4323</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2021</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Nguyen-Tang, Thanh</creatorcontrib><creatorcontrib>Gupta, Sunil</creatorcontrib><creatorcontrib>Venkatesh, Svetha</creatorcontrib><collection>CrossRef</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nguyen-Tang, Thanh</au><au>Gupta, Sunil</au><au>Venkatesh, Svetha</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Distributional Reinforcement Learning via Moment Matching</atitle><btitle>Proceedings of the ... AAAI Conference on Artificial Intelligence</btitle><date>2021-05-18</date><risdate>2021</risdate><volume>35</volume><issue>10</issue><spage>9144</spage><epage>9152</epage><pages>9144-9152</pages><issn>2159-5399</issn><eissn>2374-3468</eissn><abstract>We consider the problem of learning a set of probability distributions from the empirical Bellman dynamics in distributional reinforcement learning (RL), a class of state-of-the-art methods that estimate the distribution, as opposed to only the expectation, of the total return. We formulate a method that learns a finite set of statistics from each return distribution via neural networks, as in the distributional RL literature. Existing distributional RL methods however constrain the learned statistics to predefined functional forms of the return distribution which is both restrictive in representation and difficult in maintaining the predefined statistics. Instead, we learn unrestricted statistics, i.e., deterministic (pseudo-)samples, of the return distribution by leveraging a technique from hypothesis testing known as maximum mean discrepancy (MMD), which leads to a simpler objective amenable to backpropagation. Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target. We establish sufficient conditions for the contraction of the distributional Bellman operator and provide finite-sample analysis for the deterministic samples in distribution approximation. Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines and sets a new record in the Atari games for non-distributed agents.</abstract><doi>10.1609/aaai.v35i10.17104</doi><tpages>9</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 2159-5399
ispartof	Proceedings of the ... AAAI Conference on Artificial Intelligence, 2021, Vol.35 (10), p.9144-9152
issn	2159-5399 2374-3468
language	eng
recordid	cdi_crossref_primary_10_1609_aaai_v35i10_17104
source	Freely Accessible Science Journals
title	Distributional Reinforcement Learning via Moment Matching
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T15%3A59%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Distributional%20Reinforcement%20Learning%20via%20Moment%20Matching&rft.btitle=Proceedings%20of%20the%20...%20AAAI%20Conference%20on%20Artificial%20Intelligence&rft.au=Nguyen-Tang,%20Thanh&rft.date=2021-05-18&rft.volume=35&rft.issue=10&rft.spage=9144&rft.epage=9152&rft.pages=9144-9152&rft.issn=2159-5399&rft.eissn=2374-3468&rft_id=info:doi/10.1609/aaai.v35i10.17104&rft_dat=%3Ccrossref%3E10_1609_aaai_v35i10_17104%3C/crossref%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c175t-bd878c4d6dc049e8170d9179d968ac4cf9cc17f4d1075a79d66d664f1c5cc4323%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true