Loading…

Improve generated adversarial imitation learning with reward variance regularization

Imitation learning aims at recovering expert policies from limited demonstration data. Generative Adversarial Imitation Learning (GAIL) employs the generative adversarial learning framework for imitation learning and has shown great potentials. GAIL and its variants, however, are found highly sensit...

Full description

Saved in:
Bibliographic Details
Published in:Machine learning 2022-03, Vol.111 (3), p.977-995
Main Authors: Zhang, Yi-Feng, Luo, Fan-Ming, Yu, Yang
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c363t-30e6acc66783b79823b38f438802337392dca7ffaa56a3d7026a605ce12c4f373
cites cdi_FETCH-LOGICAL-c363t-30e6acc66783b79823b38f438802337392dca7ffaa56a3d7026a605ce12c4f373
container_end_page 995
container_issue 3
container_start_page 977
container_title Machine learning
container_volume 111
creator Zhang, Yi-Feng
Luo, Fan-Ming
Yu, Yang
description Imitation learning aims at recovering expert policies from limited demonstration data. Generative Adversarial Imitation Learning (GAIL) employs the generative adversarial learning framework for imitation learning and has shown great potentials. GAIL and its variants, however, are found highly sensitive to hyperparameters and hard to converge well in practice. One key issue is that the supervised learning discriminator has a much faster learning speed than the reinforcement learning generator, making the generator gradient vanishing. Although GAIL is formulated as a zero-sum adversarial game, the ultimate goal of GAIL is to learn the generator, thus the discriminator should play the role more like a teacher rather than a real opponent. Therefore, the learning of the discriminator should consider how the generator could learn. In this paper, we disclose that enhancing the gradient of the generator training is equivalent to increase the variance of the fake reward provided by the discriminator output. We thus propose an improved version of GAIL, GAIL-VR, in which the discriminator also learns to avoid generator gradient vanishing through regularization of the fake rewards variance. Experiments in various tasks, including locomotion tasks and Atari games, indicate that GAIL-VR can improve the training stability and imitation scores.
doi_str_mv 10.1007/s10994-021-06083-7
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2650316669</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2650316669</sourcerecordid><originalsourceid>FETCH-LOGICAL-c363t-30e6acc66783b79823b38f438802337392dca7ffaa56a3d7026a605ce12c4f373</originalsourceid><addsrcrecordid>eNp9kEtPwzAQhC0EEqXwBzhF4mxYe-NHjqjiUakSl3K2XMcJqdKk2Gkr-PW4DRI3TquRvpndHUJuGdwzAPUQGRRFToEzChI0UnVGJkwopCCkOCcT0FpQybi4JFcxrgGASy0nZDnfbEO_91ntOx_s4MvMlnsfog2NbbNm0wx2aPoua70NXdPV2aEZPrLgDzaU2f5Idc4nXe_aJL5P8DW5qGwb_c3vnJL356fl7JUu3l7ms8cFdShxoAheWuekVBpXqtAcV6irHLUGjqiw4KWzqqqsFdJiqdLJVoJwnnGXVwmYkrsxN73wufNxMOt-F7q00nApAJmUskgUHykX-hiDr8w2NBsbvgwDc2zPjO2Z1J45tWeO0TiaYoK72oe_6H9cPz11czI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2650316669</pqid></control><display><type>article</type><title>Improve generated adversarial imitation learning with reward variance regularization</title><source>Springer Nature:Jisc Collections:Springer Nature Read and Publish 2023-2025: Springer Reading List</source><creator>Zhang, Yi-Feng ; Luo, Fan-Ming ; Yu, Yang</creator><creatorcontrib>Zhang, Yi-Feng ; Luo, Fan-Ming ; Yu, Yang</creatorcontrib><description>Imitation learning aims at recovering expert policies from limited demonstration data. Generative Adversarial Imitation Learning (GAIL) employs the generative adversarial learning framework for imitation learning and has shown great potentials. GAIL and its variants, however, are found highly sensitive to hyperparameters and hard to converge well in practice. One key issue is that the supervised learning discriminator has a much faster learning speed than the reinforcement learning generator, making the generator gradient vanishing. Although GAIL is formulated as a zero-sum adversarial game, the ultimate goal of GAIL is to learn the generator, thus the discriminator should play the role more like a teacher rather than a real opponent. Therefore, the learning of the discriminator should consider how the generator could learn. In this paper, we disclose that enhancing the gradient of the generator training is equivalent to increase the variance of the fake reward provided by the discriminator output. We thus propose an improved version of GAIL, GAIL-VR, in which the discriminator also learns to avoid generator gradient vanishing through regularization of the fake rewards variance. Experiments in various tasks, including locomotion tasks and Atari games, indicate that GAIL-VR can improve the training stability and imitation scores.</description><identifier>ISSN: 0885-6125</identifier><identifier>EISSN: 1573-0565</identifier><identifier>DOI: 10.1007/s10994-021-06083-7</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Algorithms ; Artificial Intelligence ; Computer Science ; Control ; Locomotion ; Machine Learning ; Markov analysis ; Mechatronics ; Natural Language Processing (NLP) ; Optimization ; Regularization ; Robotics ; Simulation and Modeling ; Special Issue of the ACML 2021 Journal Track ; Supervised learning ; Training ; Zero sum games</subject><ispartof>Machine learning, 2022-03, Vol.111 (3), p.977-995</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2021</rights><rights>The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2021.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c363t-30e6acc66783b79823b38f438802337392dca7ffaa56a3d7026a605ce12c4f373</citedby><cites>FETCH-LOGICAL-c363t-30e6acc66783b79823b38f438802337392dca7ffaa56a3d7026a605ce12c4f373</cites><orcidid>0000-0001-8834-4921</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Zhang, Yi-Feng</creatorcontrib><creatorcontrib>Luo, Fan-Ming</creatorcontrib><creatorcontrib>Yu, Yang</creatorcontrib><title>Improve generated adversarial imitation learning with reward variance regularization</title><title>Machine learning</title><addtitle>Mach Learn</addtitle><description>Imitation learning aims at recovering expert policies from limited demonstration data. Generative Adversarial Imitation Learning (GAIL) employs the generative adversarial learning framework for imitation learning and has shown great potentials. GAIL and its variants, however, are found highly sensitive to hyperparameters and hard to converge well in practice. One key issue is that the supervised learning discriminator has a much faster learning speed than the reinforcement learning generator, making the generator gradient vanishing. Although GAIL is formulated as a zero-sum adversarial game, the ultimate goal of GAIL is to learn the generator, thus the discriminator should play the role more like a teacher rather than a real opponent. Therefore, the learning of the discriminator should consider how the generator could learn. In this paper, we disclose that enhancing the gradient of the generator training is equivalent to increase the variance of the fake reward provided by the discriminator output. We thus propose an improved version of GAIL, GAIL-VR, in which the discriminator also learns to avoid generator gradient vanishing through regularization of the fake rewards variance. Experiments in various tasks, including locomotion tasks and Atari games, indicate that GAIL-VR can improve the training stability and imitation scores.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Computer Science</subject><subject>Control</subject><subject>Locomotion</subject><subject>Machine Learning</subject><subject>Markov analysis</subject><subject>Mechatronics</subject><subject>Natural Language Processing (NLP)</subject><subject>Optimization</subject><subject>Regularization</subject><subject>Robotics</subject><subject>Simulation and Modeling</subject><subject>Special Issue of the ACML 2021 Journal Track</subject><subject>Supervised learning</subject><subject>Training</subject><subject>Zero sum games</subject><issn>0885-6125</issn><issn>1573-0565</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kEtPwzAQhC0EEqXwBzhF4mxYe-NHjqjiUakSl3K2XMcJqdKk2Gkr-PW4DRI3TquRvpndHUJuGdwzAPUQGRRFToEzChI0UnVGJkwopCCkOCcT0FpQybi4JFcxrgGASy0nZDnfbEO_91ntOx_s4MvMlnsfog2NbbNm0wx2aPoua70NXdPV2aEZPrLgDzaU2f5Idc4nXe_aJL5P8DW5qGwb_c3vnJL356fl7JUu3l7ms8cFdShxoAheWuekVBpXqtAcV6irHLUGjqiw4KWzqqqsFdJiqdLJVoJwnnGXVwmYkrsxN73wufNxMOt-F7q00nApAJmUskgUHykX-hiDr8w2NBsbvgwDc2zPjO2Z1J45tWeO0TiaYoK72oe_6H9cPz11czI</recordid><startdate>20220301</startdate><enddate>20220301</enddate><creator>Zhang, Yi-Feng</creator><creator>Luo, Fan-Ming</creator><creator>Yu, Yang</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7XB</scope><scope>88I</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M2P</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0001-8834-4921</orcidid></search><sort><creationdate>20220301</creationdate><title>Improve generated adversarial imitation learning with reward variance regularization</title><author>Zhang, Yi-Feng ; Luo, Fan-Ming ; Yu, Yang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c363t-30e6acc66783b79823b38f438802337392dca7ffaa56a3d7026a605ce12c4f373</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Computer Science</topic><topic>Control</topic><topic>Locomotion</topic><topic>Machine Learning</topic><topic>Markov analysis</topic><topic>Mechatronics</topic><topic>Natural Language Processing (NLP)</topic><topic>Optimization</topic><topic>Regularization</topic><topic>Robotics</topic><topic>Simulation and Modeling</topic><topic>Special Issue of the ACML 2021 Journal Track</topic><topic>Supervised learning</topic><topic>Training</topic><topic>Zero sum games</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Yi-Feng</creatorcontrib><creatorcontrib>Luo, Fan-Ming</creatorcontrib><creatorcontrib>Yu, Yang</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Science Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Machine learning</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Yi-Feng</au><au>Luo, Fan-Ming</au><au>Yu, Yang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improve generated adversarial imitation learning with reward variance regularization</atitle><jtitle>Machine learning</jtitle><stitle>Mach Learn</stitle><date>2022-03-01</date><risdate>2022</risdate><volume>111</volume><issue>3</issue><spage>977</spage><epage>995</epage><pages>977-995</pages><issn>0885-6125</issn><eissn>1573-0565</eissn><abstract>Imitation learning aims at recovering expert policies from limited demonstration data. Generative Adversarial Imitation Learning (GAIL) employs the generative adversarial learning framework for imitation learning and has shown great potentials. GAIL and its variants, however, are found highly sensitive to hyperparameters and hard to converge well in practice. One key issue is that the supervised learning discriminator has a much faster learning speed than the reinforcement learning generator, making the generator gradient vanishing. Although GAIL is formulated as a zero-sum adversarial game, the ultimate goal of GAIL is to learn the generator, thus the discriminator should play the role more like a teacher rather than a real opponent. Therefore, the learning of the discriminator should consider how the generator could learn. In this paper, we disclose that enhancing the gradient of the generator training is equivalent to increase the variance of the fake reward provided by the discriminator output. We thus propose an improved version of GAIL, GAIL-VR, in which the discriminator also learns to avoid generator gradient vanishing through regularization of the fake rewards variance. Experiments in various tasks, including locomotion tasks and Atari games, indicate that GAIL-VR can improve the training stability and imitation scores.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10994-021-06083-7</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0001-8834-4921</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0885-6125
ispartof Machine learning, 2022-03, Vol.111 (3), p.977-995
issn 0885-6125
1573-0565
language eng
recordid cdi_proquest_journals_2650316669
source Springer Nature:Jisc Collections:Springer Nature Read and Publish 2023-2025: Springer Reading List
subjects Algorithms
Artificial Intelligence
Computer Science
Control
Locomotion
Machine Learning
Markov analysis
Mechatronics
Natural Language Processing (NLP)
Optimization
Regularization
Robotics
Simulation and Modeling
Special Issue of the ACML 2021 Journal Track
Supervised learning
Training
Zero sum games
title Improve generated adversarial imitation learning with reward variance regularization
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T07%3A44%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improve%20generated%20adversarial%20imitation%20learning%20with%20reward%20variance%20regularization&rft.jtitle=Machine%20learning&rft.au=Zhang,%20Yi-Feng&rft.date=2022-03-01&rft.volume=111&rft.issue=3&rft.spage=977&rft.epage=995&rft.pages=977-995&rft.issn=0885-6125&rft.eissn=1573-0565&rft_id=info:doi/10.1007/s10994-021-06083-7&rft_dat=%3Cproquest_cross%3E2650316669%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c363t-30e6acc66783b79823b38f438802337392dca7ffaa56a3d7026a605ce12c4f373%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2650316669&rft_id=info:pmid/&rfr_iscdi=true