Loading…

Distributional Reinforcement Learning via Moment Matching

We consider the problem of learning a set of probability distributions from the empirical Bellman dynamics in distributional reinforcement learning (RL), a class of state-of-the-art methods that estimate the distribution, as opposed to only the expectation, of the total return. We formulate a method...

Full description

Saved in:
Bibliographic Details
Main Authors: Nguyen-Tang, Thanh, Gupta, Sunil, Venkatesh, Svetha
Format: Conference Proceeding
Language:English
Citations: Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c175t-bd878c4d6dc049e8170d9179d968ac4cf9cc17f4d1075a79d66d664f1c5cc4323
cites
container_end_page 9152
container_issue 10
container_start_page 9144
container_title
container_volume 35
creator Nguyen-Tang, Thanh
Gupta, Sunil
Venkatesh, Svetha
description We consider the problem of learning a set of probability distributions from the empirical Bellman dynamics in distributional reinforcement learning (RL), a class of state-of-the-art methods that estimate the distribution, as opposed to only the expectation, of the total return. We formulate a method that learns a finite set of statistics from each return distribution via neural networks, as in the distributional RL literature. Existing distributional RL methods however constrain the learned statistics to predefined functional forms of the return distribution which is both restrictive in representation and difficult in maintaining the predefined statistics. Instead, we learn unrestricted statistics, i.e., deterministic (pseudo-)samples, of the return distribution by leveraging a technique from hypothesis testing known as maximum mean discrepancy (MMD), which leads to a simpler objective amenable to backpropagation. Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target. We establish sufficient conditions for the contraction of the distributional Bellman operator and provide finite-sample analysis for the deterministic samples in distribution approximation. Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines and sets a new record in the Atari games for non-distributed agents.
doi_str_mv 10.1609/aaai.v35i10.17104
format conference_proceeding
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1609_aaai_v35i10_17104</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1609_aaai_v35i10_17104</sourcerecordid><originalsourceid>FETCH-LOGICAL-c175t-bd878c4d6dc049e8170d9179d968ac4cf9cc17f4d1075a79d66d664f1c5cc4323</originalsourceid><addsrcrecordid>eNotj9tKAzEQhoMoWGofwLt9ga2Z5rS5lHqELYLodUgniUbaXUliwbc32zoMzM_HMMNHyDXQJUiqb6y1cXlgIk5AAeVnZLZiireMy-68ZhC6FUzrS7LI-YvW4hoA1Izou5hLitufEsfB7ppXH4cwJvR7P5Sm9zYNcfhoDtE2m_HINrbgZ2VX5CLYXfaL_zkn7w_3b-untn95fF7f9i2CEqXduk51yJ10WJ_6DhR1GpR2WnYWOQaNdTFwB1QJW7mUtXkAFIicrdicwOkupjHn5IP5TnFv068BaiZ9M-mbk7456rM_A8ZPPQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Distributional Reinforcement Learning via Moment Matching</title><source>Freely Accessible Science Journals</source><creator>Nguyen-Tang, Thanh ; Gupta, Sunil ; Venkatesh, Svetha</creator><creatorcontrib>Nguyen-Tang, Thanh ; Gupta, Sunil ; Venkatesh, Svetha</creatorcontrib><description>We consider the problem of learning a set of probability distributions from the empirical Bellman dynamics in distributional reinforcement learning (RL), a class of state-of-the-art methods that estimate the distribution, as opposed to only the expectation, of the total return. We formulate a method that learns a finite set of statistics from each return distribution via neural networks, as in the distributional RL literature. Existing distributional RL methods however constrain the learned statistics to predefined functional forms of the return distribution which is both restrictive in representation and difficult in maintaining the predefined statistics. Instead, we learn unrestricted statistics, i.e., deterministic (pseudo-)samples, of the return distribution by leveraging a technique from hypothesis testing known as maximum mean discrepancy (MMD), which leads to a simpler objective amenable to backpropagation. Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target. We establish sufficient conditions for the contraction of the distributional Bellman operator and provide finite-sample analysis for the deterministic samples in distribution approximation. Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines and sets a new record in the Atari games for non-distributed agents.</description><identifier>ISSN: 2159-5399</identifier><identifier>EISSN: 2374-3468</identifier><identifier>DOI: 10.1609/aaai.v35i10.17104</identifier><language>eng</language><ispartof>Proceedings of the ... AAAI Conference on Artificial Intelligence, 2021, Vol.35 (10), p.9144-9152</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c175t-bd878c4d6dc049e8170d9179d968ac4cf9cc17f4d1075a79d66d664f1c5cc4323</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Nguyen-Tang, Thanh</creatorcontrib><creatorcontrib>Gupta, Sunil</creatorcontrib><creatorcontrib>Venkatesh, Svetha</creatorcontrib><title>Distributional Reinforcement Learning via Moment Matching</title><title>Proceedings of the ... AAAI Conference on Artificial Intelligence</title><description>We consider the problem of learning a set of probability distributions from the empirical Bellman dynamics in distributional reinforcement learning (RL), a class of state-of-the-art methods that estimate the distribution, as opposed to only the expectation, of the total return. We formulate a method that learns a finite set of statistics from each return distribution via neural networks, as in the distributional RL literature. Existing distributional RL methods however constrain the learned statistics to predefined functional forms of the return distribution which is both restrictive in representation and difficult in maintaining the predefined statistics. Instead, we learn unrestricted statistics, i.e., deterministic (pseudo-)samples, of the return distribution by leveraging a technique from hypothesis testing known as maximum mean discrepancy (MMD), which leads to a simpler objective amenable to backpropagation. Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target. We establish sufficient conditions for the contraction of the distributional Bellman operator and provide finite-sample analysis for the deterministic samples in distribution approximation. Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines and sets a new record in the Atari games for non-distributed agents.</description><issn>2159-5399</issn><issn>2374-3468</issn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2021</creationdate><recordtype>conference_proceeding</recordtype><recordid>eNotj9tKAzEQhoMoWGofwLt9ga2Z5rS5lHqELYLodUgniUbaXUliwbc32zoMzM_HMMNHyDXQJUiqb6y1cXlgIk5AAeVnZLZiireMy-68ZhC6FUzrS7LI-YvW4hoA1Izou5hLitufEsfB7ppXH4cwJvR7P5Sm9zYNcfhoDtE2m_HINrbgZ2VX5CLYXfaL_zkn7w_3b-untn95fF7f9i2CEqXduk51yJ10WJ_6DhR1GpR2WnYWOQaNdTFwB1QJW7mUtXkAFIicrdicwOkupjHn5IP5TnFv068BaiZ9M-mbk7456rM_A8ZPPQ</recordid><startdate>20210518</startdate><enddate>20210518</enddate><creator>Nguyen-Tang, Thanh</creator><creator>Gupta, Sunil</creator><creator>Venkatesh, Svetha</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20210518</creationdate><title>Distributional Reinforcement Learning via Moment Matching</title><author>Nguyen-Tang, Thanh ; Gupta, Sunil ; Venkatesh, Svetha</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c175t-bd878c4d6dc049e8170d9179d968ac4cf9cc17f4d1075a79d66d664f1c5cc4323</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2021</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Nguyen-Tang, Thanh</creatorcontrib><creatorcontrib>Gupta, Sunil</creatorcontrib><creatorcontrib>Venkatesh, Svetha</creatorcontrib><collection>CrossRef</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nguyen-Tang, Thanh</au><au>Gupta, Sunil</au><au>Venkatesh, Svetha</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Distributional Reinforcement Learning via Moment Matching</atitle><btitle>Proceedings of the ... AAAI Conference on Artificial Intelligence</btitle><date>2021-05-18</date><risdate>2021</risdate><volume>35</volume><issue>10</issue><spage>9144</spage><epage>9152</epage><pages>9144-9152</pages><issn>2159-5399</issn><eissn>2374-3468</eissn><abstract>We consider the problem of learning a set of probability distributions from the empirical Bellman dynamics in distributional reinforcement learning (RL), a class of state-of-the-art methods that estimate the distribution, as opposed to only the expectation, of the total return. We formulate a method that learns a finite set of statistics from each return distribution via neural networks, as in the distributional RL literature. Existing distributional RL methods however constrain the learned statistics to predefined functional forms of the return distribution which is both restrictive in representation and difficult in maintaining the predefined statistics. Instead, we learn unrestricted statistics, i.e., deterministic (pseudo-)samples, of the return distribution by leveraging a technique from hypothesis testing known as maximum mean discrepancy (MMD), which leads to a simpler objective amenable to backpropagation. Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target. We establish sufficient conditions for the contraction of the distributional Bellman operator and provide finite-sample analysis for the deterministic samples in distribution approximation. Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines and sets a new record in the Atari games for non-distributed agents.</abstract><doi>10.1609/aaai.v35i10.17104</doi><tpages>9</tpages></addata></record>
fulltext fulltext
identifier ISSN: 2159-5399
ispartof Proceedings of the ... AAAI Conference on Artificial Intelligence, 2021, Vol.35 (10), p.9144-9152
issn 2159-5399
2374-3468
language eng
recordid cdi_crossref_primary_10_1609_aaai_v35i10_17104
source Freely Accessible Science Journals
title Distributional Reinforcement Learning via Moment Matching
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T15%3A59%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Distributional%20Reinforcement%20Learning%20via%20Moment%20Matching&rft.btitle=Proceedings%20of%20the%20...%20AAAI%20Conference%20on%20Artificial%20Intelligence&rft.au=Nguyen-Tang,%20Thanh&rft.date=2021-05-18&rft.volume=35&rft.issue=10&rft.spage=9144&rft.epage=9152&rft.pages=9144-9152&rft.issn=2159-5399&rft.eissn=2374-3468&rft_id=info:doi/10.1609/aaai.v35i10.17104&rft_dat=%3Ccrossref%3E10_1609_aaai_v35i10_17104%3C/crossref%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c175t-bd878c4d6dc049e8170d9179d968ac4cf9cc17f4d1075a79d66d664f1c5cc4323%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true