Loading…

An Empirical Analysis of Measure-Valued Derivatives for Policy Gradients

Reinforcement learning methods for robotics are increasingly successful due to the constant development of better policy gradient techniques. A precise (low variance) and accurate (low bias) gradient estimator is crucial to face increasingly complex tasks. Traditional policy gradient algorithms use...

Full description

Saved in:

Bibliographic Details
Main Authors:	Carvalho, Joao, Tateo, Davide, Muratore, Fabio, Peters, Jan
Format:	Conference Proceeding
Language:	English
Subjects:	Approximation algorithms Frequency estimation Inference algorithms Maximum likelihood estimation Neural networks Reinforcement learning
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	10
container_issue
container_start_page	1
container_title
container_volume
creator	Carvalho, Joao Tateo, Davide Muratore, Fabio Peters, Jan
description	Reinforcement learning methods for robotics are increasingly successful due to the constant development of better policy gradient techniques. A precise (low variance) and accurate (low bias) gradient estimator is crucial to face increasingly complex tasks. Traditional policy gradient algorithms use the likelihood-ratio trick, which is known to produce unbiased but high variance estimates. More modern approaches exploit the reparametrization trick, which gives lower variance gradient estimates but requires differentiable value function approximators. In this work, we study a different type of stochastic gradient estimator: the Measure-Valued Derivative. This estimator is unbiased, has low variance, and can be used with differentiable and non-differentiable function approximators. We empirically evaluate this estimator in the actor-critic policy gradient setting and show that it can reach comparable performance with methods based on the likelihood-ratio or reparametrization tricks, both in low and high-dimensional action spaces.
doi_str_mv	10.1109/IJCNN52387.2021.9533642
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_9533642</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9533642</ieee_id><sourcerecordid>9533642</sourcerecordid><originalsourceid>FETCH-LOGICAL-i203t-3b25249f88fbec460eec02b5f616e8b1c8ca64e5038ebad4909a2b7545b032ed3</originalsourceid><addsrcrecordid>eNotz71OwzAUQGGDhERbeAIG_AIJ17-JxyiUtqgUBmCt7ORaMkqTyk4r5e0Z2ulsn3QIeWaQMwbmZfNe73aKi7LIOXCWGyWElvyGzJnWSgoDYG7JjDPNMimhuCfzlP4AuDBGzMi66unycAwxNLajVW-7KYVEB08_0KZTxOzXdids6SvGcLZjOGOifoj0a-hCM9FVtG3AfkwP5M7bLuHjtQvy87b8rtfZ9nO1qattFjiIMROOKy6NL0vvsJEaEBvgTnnNNJaONWVjtUQFokRnW2nAWO4KJZUDwbEVC_J0cQMi7o8xHGyc9tdr8Q8qn00X</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>An Empirical Analysis of Measure-Valued Derivatives for Policy Gradients</title><source>IEEE Xplore All Conference Series</source><creator>Carvalho, Joao ; Tateo, Davide ; Muratore, Fabio ; Peters, Jan</creator><creatorcontrib>Carvalho, Joao ; Tateo, Davide ; Muratore, Fabio ; Peters, Jan</creatorcontrib><description>Reinforcement learning methods for robotics are increasingly successful due to the constant development of better policy gradient techniques. A precise (low variance) and accurate (low bias) gradient estimator is crucial to face increasingly complex tasks. Traditional policy gradient algorithms use the likelihood-ratio trick, which is known to produce unbiased but high variance estimates. More modern approaches exploit the reparametrization trick, which gives lower variance gradient estimates but requires differentiable value function approximators. In this work, we study a different type of stochastic gradient estimator: the Measure-Valued Derivative. This estimator is unbiased, has low variance, and can be used with differentiable and non-differentiable function approximators. We empirically evaluate this estimator in the actor-critic policy gradient setting and show that it can reach comparable performance with methods based on the likelihood-ratio or reparametrization tricks, both in low and high-dimensional action spaces.</description><identifier>EISSN: 2161-4407</identifier><identifier>EISBN: 1665439009</identifier><identifier>EISBN: 9781665439008</identifier><identifier>DOI: 10.1109/IJCNN52387.2021.9533642</identifier><language>eng</language><publisher>IEEE</publisher><subject>Approximation algorithms ; Frequency estimation ; Inference algorithms ; Maximum likelihood estimation ; Neural networks ; Reinforcement learning</subject><ispartof>2021 International Joint Conference on Neural Networks (IJCNN), 2021, p.1-10</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9533642$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,23930,23931,25140,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9533642$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Carvalho, Joao</creatorcontrib><creatorcontrib>Tateo, Davide</creatorcontrib><creatorcontrib>Muratore, Fabio</creatorcontrib><creatorcontrib>Peters, Jan</creatorcontrib><title>An Empirical Analysis of Measure-Valued Derivatives for Policy Gradients</title><title>2021 International Joint Conference on Neural Networks (IJCNN)</title><addtitle>IJCNN</addtitle><description>Reinforcement learning methods for robotics are increasingly successful due to the constant development of better policy gradient techniques. A precise (low variance) and accurate (low bias) gradient estimator is crucial to face increasingly complex tasks. Traditional policy gradient algorithms use the likelihood-ratio trick, which is known to produce unbiased but high variance estimates. More modern approaches exploit the reparametrization trick, which gives lower variance gradient estimates but requires differentiable value function approximators. In this work, we study a different type of stochastic gradient estimator: the Measure-Valued Derivative. This estimator is unbiased, has low variance, and can be used with differentiable and non-differentiable function approximators. We empirically evaluate this estimator in the actor-critic policy gradient setting and show that it can reach comparable performance with methods based on the likelihood-ratio or reparametrization tricks, both in low and high-dimensional action spaces.</description><subject>Approximation algorithms</subject><subject>Frequency estimation</subject><subject>Inference algorithms</subject><subject>Maximum likelihood estimation</subject><subject>Neural networks</subject><subject>Reinforcement learning</subject><issn>2161-4407</issn><isbn>1665439009</isbn><isbn>9781665439008</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2021</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotz71OwzAUQGGDhERbeAIG_AIJ17-JxyiUtqgUBmCt7ORaMkqTyk4r5e0Z2ulsn3QIeWaQMwbmZfNe73aKi7LIOXCWGyWElvyGzJnWSgoDYG7JjDPNMimhuCfzlP4AuDBGzMi66unycAwxNLajVW-7KYVEB08_0KZTxOzXdids6SvGcLZjOGOifoj0a-hCM9FVtG3AfkwP5M7bLuHjtQvy87b8rtfZ9nO1qattFjiIMROOKy6NL0vvsJEaEBvgTnnNNJaONWVjtUQFokRnW2nAWO4KJZUDwbEVC_J0cQMi7o8xHGyc9tdr8Q8qn00X</recordid><startdate>20210718</startdate><enddate>20210718</enddate><creator>Carvalho, Joao</creator><creator>Tateo, Davide</creator><creator>Muratore, Fabio</creator><creator>Peters, Jan</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>20210718</creationdate><title>An Empirical Analysis of Measure-Valued Derivatives for Policy Gradients</title><author>Carvalho, Joao ; Tateo, Davide ; Muratore, Fabio ; Peters, Jan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i203t-3b25249f88fbec460eec02b5f616e8b1c8ca64e5038ebad4909a2b7545b032ed3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Approximation algorithms</topic><topic>Frequency estimation</topic><topic>Inference algorithms</topic><topic>Maximum likelihood estimation</topic><topic>Neural networks</topic><topic>Reinforcement learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Carvalho, Joao</creatorcontrib><creatorcontrib>Tateo, Davide</creatorcontrib><creatorcontrib>Muratore, Fabio</creatorcontrib><creatorcontrib>Peters, Jan</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEL</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Carvalho, Joao</au><au>Tateo, Davide</au><au>Muratore, Fabio</au><au>Peters, Jan</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>An Empirical Analysis of Measure-Valued Derivatives for Policy Gradients</atitle><btitle>2021 International Joint Conference on Neural Networks (IJCNN)</btitle><stitle>IJCNN</stitle><date>2021-07-18</date><risdate>2021</risdate><spage>1</spage><epage>10</epage><pages>1-10</pages><eissn>2161-4407</eissn><eisbn>1665439009</eisbn><eisbn>9781665439008</eisbn><abstract>Reinforcement learning methods for robotics are increasingly successful due to the constant development of better policy gradient techniques. A precise (low variance) and accurate (low bias) gradient estimator is crucial to face increasingly complex tasks. Traditional policy gradient algorithms use the likelihood-ratio trick, which is known to produce unbiased but high variance estimates. More modern approaches exploit the reparametrization trick, which gives lower variance gradient estimates but requires differentiable value function approximators. In this work, we study a different type of stochastic gradient estimator: the Measure-Valued Derivative. This estimator is unbiased, has low variance, and can be used with differentiable and non-differentiable function approximators. We empirically evaluate this estimator in the actor-critic policy gradient setting and show that it can reach comparable performance with methods based on the likelihood-ratio or reparametrization tricks, both in low and high-dimensional action spaces.</abstract><pub>IEEE</pub><doi>10.1109/IJCNN52387.2021.9533642</doi><tpages>10</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	EISSN: 2161-4407
ispartof	2021 International Joint Conference on Neural Networks (IJCNN), 2021, p.1-10
issn	2161-4407
language	eng
recordid	cdi_ieee_primary_9533642
source	IEEE Xplore All Conference Series
subjects	Approximation algorithms Frequency estimation Inference algorithms Maximum likelihood estimation Neural networks Reinforcement learning
title	An Empirical Analysis of Measure-Valued Derivatives for Policy Gradients
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T11%3A05%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=An%20Empirical%20Analysis%20of%20Measure-Valued%20Derivatives%20for%20Policy%20Gradients&rft.btitle=2021%20International%20Joint%20Conference%20on%20Neural%20Networks%20(IJCNN)&rft.au=Carvalho,%20Joao&rft.date=2021-07-18&rft.spage=1&rft.epage=10&rft.pages=1-10&rft.eissn=2161-4407&rft_id=info:doi/10.1109/IJCNN52387.2021.9533642&rft.eisbn=1665439009&rft.eisbn_list=9781665439008&rft_dat=%3Cieee_CHZPO%3E9533642%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i203t-3b25249f88fbec460eec02b5f616e8b1c8ca64e5038ebad4909a2b7545b032ed3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=9533642&rfr_iscdi=true