Loading…
Improve generated adversarial imitation learning with reward variance regularization
Imitation learning aims at recovering expert policies from limited demonstration data. Generative Adversarial Imitation Learning (GAIL) employs the generative adversarial learning framework for imitation learning and has shown great potentials. GAIL and its variants, however, are found highly sensit...
Saved in:
Published in: | Machine learning 2022-03, Vol.111 (3), p.977-995 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c363t-30e6acc66783b79823b38f438802337392dca7ffaa56a3d7026a605ce12c4f373 |
---|---|
cites | cdi_FETCH-LOGICAL-c363t-30e6acc66783b79823b38f438802337392dca7ffaa56a3d7026a605ce12c4f373 |
container_end_page | 995 |
container_issue | 3 |
container_start_page | 977 |
container_title | Machine learning |
container_volume | 111 |
creator | Zhang, Yi-Feng Luo, Fan-Ming Yu, Yang |
description | Imitation learning aims at recovering expert policies from limited demonstration data. Generative Adversarial Imitation Learning (GAIL) employs the generative adversarial learning framework for imitation learning and has shown great potentials. GAIL and its variants, however, are found highly sensitive to hyperparameters and hard to converge well in practice. One key issue is that the supervised learning discriminator has a much faster learning speed than the reinforcement learning generator, making the generator gradient vanishing. Although GAIL is formulated as a zero-sum adversarial game, the ultimate goal of GAIL is to learn the generator, thus the discriminator should play the role more like a teacher rather than a real opponent. Therefore, the learning of the discriminator should consider how the generator could learn. In this paper, we disclose that enhancing the gradient of the generator training is equivalent to increase the variance of the fake reward provided by the discriminator output. We thus propose an improved version of GAIL, GAIL-VR, in which the discriminator also learns to avoid generator gradient vanishing through regularization of the fake rewards variance. Experiments in various tasks, including locomotion tasks and Atari games, indicate that GAIL-VR can improve the training stability and imitation scores. |
doi_str_mv | 10.1007/s10994-021-06083-7 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2650316669</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2650316669</sourcerecordid><originalsourceid>FETCH-LOGICAL-c363t-30e6acc66783b79823b38f438802337392dca7ffaa56a3d7026a605ce12c4f373</originalsourceid><addsrcrecordid>eNp9kEtPwzAQhC0EEqXwBzhF4mxYe-NHjqjiUakSl3K2XMcJqdKk2Gkr-PW4DRI3TquRvpndHUJuGdwzAPUQGRRFToEzChI0UnVGJkwopCCkOCcT0FpQybi4JFcxrgGASy0nZDnfbEO_91ntOx_s4MvMlnsfog2NbbNm0wx2aPoua70NXdPV2aEZPrLgDzaU2f5Idc4nXe_aJL5P8DW5qGwb_c3vnJL356fl7JUu3l7ms8cFdShxoAheWuekVBpXqtAcV6irHLUGjqiw4KWzqqqsFdJiqdLJVoJwnnGXVwmYkrsxN73wufNxMOt-F7q00nApAJmUskgUHykX-hiDr8w2NBsbvgwDc2zPjO2Z1J45tWeO0TiaYoK72oe_6H9cPz11czI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2650316669</pqid></control><display><type>article</type><title>Improve generated adversarial imitation learning with reward variance regularization</title><source>Springer Nature:Jisc Collections:Springer Nature Read and Publish 2023-2025: Springer Reading List</source><creator>Zhang, Yi-Feng ; Luo, Fan-Ming ; Yu, Yang</creator><creatorcontrib>Zhang, Yi-Feng ; Luo, Fan-Ming ; Yu, Yang</creatorcontrib><description>Imitation learning aims at recovering expert policies from limited demonstration data. Generative Adversarial Imitation Learning (GAIL) employs the generative adversarial learning framework for imitation learning and has shown great potentials. GAIL and its variants, however, are found highly sensitive to hyperparameters and hard to converge well in practice. One key issue is that the supervised learning discriminator has a much faster learning speed than the reinforcement learning generator, making the generator gradient vanishing. Although GAIL is formulated as a zero-sum adversarial game, the ultimate goal of GAIL is to learn the generator, thus the discriminator should play the role more like a teacher rather than a real opponent. Therefore, the learning of the discriminator should consider how the generator could learn. In this paper, we disclose that enhancing the gradient of the generator training is equivalent to increase the variance of the fake reward provided by the discriminator output. We thus propose an improved version of GAIL, GAIL-VR, in which the discriminator also learns to avoid generator gradient vanishing through regularization of the fake rewards variance. Experiments in various tasks, including locomotion tasks and Atari games, indicate that GAIL-VR can improve the training stability and imitation scores.</description><identifier>ISSN: 0885-6125</identifier><identifier>EISSN: 1573-0565</identifier><identifier>DOI: 10.1007/s10994-021-06083-7</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Algorithms ; Artificial Intelligence ; Computer Science ; Control ; Locomotion ; Machine Learning ; Markov analysis ; Mechatronics ; Natural Language Processing (NLP) ; Optimization ; Regularization ; Robotics ; Simulation and Modeling ; Special Issue of the ACML 2021 Journal Track ; Supervised learning ; Training ; Zero sum games</subject><ispartof>Machine learning, 2022-03, Vol.111 (3), p.977-995</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2021</rights><rights>The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2021.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c363t-30e6acc66783b79823b38f438802337392dca7ffaa56a3d7026a605ce12c4f373</citedby><cites>FETCH-LOGICAL-c363t-30e6acc66783b79823b38f438802337392dca7ffaa56a3d7026a605ce12c4f373</cites><orcidid>0000-0001-8834-4921</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Zhang, Yi-Feng</creatorcontrib><creatorcontrib>Luo, Fan-Ming</creatorcontrib><creatorcontrib>Yu, Yang</creatorcontrib><title>Improve generated adversarial imitation learning with reward variance regularization</title><title>Machine learning</title><addtitle>Mach Learn</addtitle><description>Imitation learning aims at recovering expert policies from limited demonstration data. Generative Adversarial Imitation Learning (GAIL) employs the generative adversarial learning framework for imitation learning and has shown great potentials. GAIL and its variants, however, are found highly sensitive to hyperparameters and hard to converge well in practice. One key issue is that the supervised learning discriminator has a much faster learning speed than the reinforcement learning generator, making the generator gradient vanishing. Although GAIL is formulated as a zero-sum adversarial game, the ultimate goal of GAIL is to learn the generator, thus the discriminator should play the role more like a teacher rather than a real opponent. Therefore, the learning of the discriminator should consider how the generator could learn. In this paper, we disclose that enhancing the gradient of the generator training is equivalent to increase the variance of the fake reward provided by the discriminator output. We thus propose an improved version of GAIL, GAIL-VR, in which the discriminator also learns to avoid generator gradient vanishing through regularization of the fake rewards variance. Experiments in various tasks, including locomotion tasks and Atari games, indicate that GAIL-VR can improve the training stability and imitation scores.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Computer Science</subject><subject>Control</subject><subject>Locomotion</subject><subject>Machine Learning</subject><subject>Markov analysis</subject><subject>Mechatronics</subject><subject>Natural Language Processing (NLP)</subject><subject>Optimization</subject><subject>Regularization</subject><subject>Robotics</subject><subject>Simulation and Modeling</subject><subject>Special Issue of the ACML 2021 Journal Track</subject><subject>Supervised learning</subject><subject>Training</subject><subject>Zero sum games</subject><issn>0885-6125</issn><issn>1573-0565</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kEtPwzAQhC0EEqXwBzhF4mxYe-NHjqjiUakSl3K2XMcJqdKk2Gkr-PW4DRI3TquRvpndHUJuGdwzAPUQGRRFToEzChI0UnVGJkwopCCkOCcT0FpQybi4JFcxrgGASy0nZDnfbEO_91ntOx_s4MvMlnsfog2NbbNm0wx2aPoua70NXdPV2aEZPrLgDzaU2f5Idc4nXe_aJL5P8DW5qGwb_c3vnJL356fl7JUu3l7ms8cFdShxoAheWuekVBpXqtAcV6irHLUGjqiw4KWzqqqsFdJiqdLJVoJwnnGXVwmYkrsxN73wufNxMOt-F7q00nApAJmUskgUHykX-hiDr8w2NBsbvgwDc2zPjO2Z1J45tWeO0TiaYoK72oe_6H9cPz11czI</recordid><startdate>20220301</startdate><enddate>20220301</enddate><creator>Zhang, Yi-Feng</creator><creator>Luo, Fan-Ming</creator><creator>Yu, Yang</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7XB</scope><scope>88I</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M2P</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0001-8834-4921</orcidid></search><sort><creationdate>20220301</creationdate><title>Improve generated adversarial imitation learning with reward variance regularization</title><author>Zhang, Yi-Feng ; Luo, Fan-Ming ; Yu, Yang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c363t-30e6acc66783b79823b38f438802337392dca7ffaa56a3d7026a605ce12c4f373</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Computer Science</topic><topic>Control</topic><topic>Locomotion</topic><topic>Machine Learning</topic><topic>Markov analysis</topic><topic>Mechatronics</topic><topic>Natural Language Processing (NLP)</topic><topic>Optimization</topic><topic>Regularization</topic><topic>Robotics</topic><topic>Simulation and Modeling</topic><topic>Special Issue of the ACML 2021 Journal Track</topic><topic>Supervised learning</topic><topic>Training</topic><topic>Zero sum games</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Yi-Feng</creatorcontrib><creatorcontrib>Luo, Fan-Ming</creatorcontrib><creatorcontrib>Yu, Yang</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Science Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Science Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Machine learning</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Yi-Feng</au><au>Luo, Fan-Ming</au><au>Yu, Yang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improve generated adversarial imitation learning with reward variance regularization</atitle><jtitle>Machine learning</jtitle><stitle>Mach Learn</stitle><date>2022-03-01</date><risdate>2022</risdate><volume>111</volume><issue>3</issue><spage>977</spage><epage>995</epage><pages>977-995</pages><issn>0885-6125</issn><eissn>1573-0565</eissn><abstract>Imitation learning aims at recovering expert policies from limited demonstration data. Generative Adversarial Imitation Learning (GAIL) employs the generative adversarial learning framework for imitation learning and has shown great potentials. GAIL and its variants, however, are found highly sensitive to hyperparameters and hard to converge well in practice. One key issue is that the supervised learning discriminator has a much faster learning speed than the reinforcement learning generator, making the generator gradient vanishing. Although GAIL is formulated as a zero-sum adversarial game, the ultimate goal of GAIL is to learn the generator, thus the discriminator should play the role more like a teacher rather than a real opponent. Therefore, the learning of the discriminator should consider how the generator could learn. In this paper, we disclose that enhancing the gradient of the generator training is equivalent to increase the variance of the fake reward provided by the discriminator output. We thus propose an improved version of GAIL, GAIL-VR, in which the discriminator also learns to avoid generator gradient vanishing through regularization of the fake rewards variance. Experiments in various tasks, including locomotion tasks and Atari games, indicate that GAIL-VR can improve the training stability and imitation scores.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10994-021-06083-7</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0001-8834-4921</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0885-6125 |
ispartof | Machine learning, 2022-03, Vol.111 (3), p.977-995 |
issn | 0885-6125 1573-0565 |
language | eng |
recordid | cdi_proquest_journals_2650316669 |
source | Springer Nature:Jisc Collections:Springer Nature Read and Publish 2023-2025: Springer Reading List |
subjects | Algorithms Artificial Intelligence Computer Science Control Locomotion Machine Learning Markov analysis Mechatronics Natural Language Processing (NLP) Optimization Regularization Robotics Simulation and Modeling Special Issue of the ACML 2021 Journal Track Supervised learning Training Zero sum games |
title | Improve generated adversarial imitation learning with reward variance regularization |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T07%3A44%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improve%20generated%20adversarial%20imitation%20learning%20with%20reward%20variance%20regularization&rft.jtitle=Machine%20learning&rft.au=Zhang,%20Yi-Feng&rft.date=2022-03-01&rft.volume=111&rft.issue=3&rft.spage=977&rft.epage=995&rft.pages=977-995&rft.issn=0885-6125&rft.eissn=1573-0565&rft_id=info:doi/10.1007/s10994-021-06083-7&rft_dat=%3Cproquest_cross%3E2650316669%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c363t-30e6acc66783b79823b38f438802337392dca7ffaa56a3d7026a605ce12c4f373%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2650316669&rft_id=info:pmid/&rfr_iscdi=true |