Loading…

Inverse Reinforcement Learning with Agents’ Biased Exploration Based on Sub-Optimal Sequential Action Data

Inverse reinforcement learning (IRL) estimates a reward function for an agent to behave along with expert data, e.g., as human operation data. However, expert data usually have redundant parts, which decrease the agent’s performance. This study extends the IRL to sub-optimal action data, including l...

Full description

Saved in:
Bibliographic Details
Published in:Journal of advanced computational intelligence and intelligent informatics 2024-03, Vol.28 (2), p.380-392
Main Authors: Uwano, Fumito, Hasegawa, Satoshi, Takadama, Keiki
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c336t-3e447281459f22df5b3ad1006e2a5a50960035c98c6811647c3570d7fb70a8d73
container_end_page 392
container_issue 2
container_start_page 380
container_title Journal of advanced computational intelligence and intelligent informatics
container_volume 28
creator Uwano, Fumito
Hasegawa, Satoshi
Takadama, Keiki
description Inverse reinforcement learning (IRL) estimates a reward function for an agent to behave along with expert data, e.g., as human operation data. However, expert data usually have redundant parts, which decrease the agent’s performance. This study extends the IRL to sub-optimal action data, including lack and detour. The proposed method searches for new actions to determine optimal expert action data. This study adopted maze problems with sub-optimal expert action data to investigate the performance of the proposed method. The experimental results show that the proposed method finds optimal expert data better than the conventional method, and the proposed search mechanisms perform better than random search.
doi_str_mv 10.20965/jaciii.2024.p0380
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2967064447</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2967064447</sourcerecordid><originalsourceid>FETCH-LOGICAL-c336t-3e447281459f22df5b3ad1006e2a5a50960035c98c6811647c3570d7fb70a8d73</originalsourceid><addsrcrecordid>eNotUMtOwzAQtBBIVKU_wMkS55T1I45zbEsplSpVonC2XMcprtIk2CmPG7_B7_ElmJTTzq5md2YHoWsCYwq5SG_32jjnYkP5uAUm4QwNiJQskUD4ecSMswQIg0s0CmEPEDEVwMkAVcv6zfpg8aN1ddl4Yw-27vDKal-7eoffXfeCJ7s4Cz9f33jqdLAFnn-0VeN155oaT_tJBJvjNlm3nTvoCm_s6zHuuAgnpqfd6U5foYtSV8GO_usQPd_Pn2YPyWq9WM4mq8QwJrqEWc4zKglP85LSoky3TBcEQFiqU53GlwFYanJphCRE8MywNIMiK7cZaFlkbIhuTndb30QfoVP75ujrKKloLjIQPApEFj2xjG9C8LZUrY_m_acioPpg1SlY9Res6oNlv2_wbYU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2967064447</pqid></control><display><type>article</type><title>Inverse Reinforcement Learning with Agents’ Biased Exploration Based on Sub-Optimal Sequential Action Data</title><source>DOAJ Directory of Open Access Journals</source><creator>Uwano, Fumito ; Hasegawa, Satoshi ; Takadama, Keiki</creator><creatorcontrib>Uwano, Fumito ; Hasegawa, Satoshi ; Takadama, Keiki</creatorcontrib><description>Inverse reinforcement learning (IRL) estimates a reward function for an agent to behave along with expert data, e.g., as human operation data. However, expert data usually have redundant parts, which decrease the agent’s performance. This study extends the IRL to sub-optimal action data, including lack and detour. The proposed method searches for new actions to determine optimal expert action data. This study adopted maze problems with sub-optimal expert action data to investigate the performance of the proposed method. The experimental results show that the proposed method finds optimal expert data better than the conventional method, and the proposed search mechanisms perform better than random search.</description><identifier>ISSN: 1343-0130</identifier><identifier>EISSN: 1883-8014</identifier><identifier>DOI: 10.20965/jaciii.2024.p0380</identifier><language>eng</language><publisher>Tokyo: Fuji Technology Press Co. Ltd</publisher><subject>Bias ; Engineering ; Entropy ; Informatics ; Methods</subject><ispartof>Journal of advanced computational intelligence and intelligent informatics, 2024-03, Vol.28 (2), p.380-392</ispartof><rights>Copyright © 2024 Fuji Technology Press Ltd.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c336t-3e447281459f22df5b3ad1006e2a5a50960035c98c6811647c3570d7fb70a8d73</cites><orcidid>0000-0003-4139-2605 ; 0009-0007-0916-5505</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,864,27923,27924</link.rule.ids></links><search><creatorcontrib>Uwano, Fumito</creatorcontrib><creatorcontrib>Hasegawa, Satoshi</creatorcontrib><creatorcontrib>Takadama, Keiki</creatorcontrib><title>Inverse Reinforcement Learning with Agents’ Biased Exploration Based on Sub-Optimal Sequential Action Data</title><title>Journal of advanced computational intelligence and intelligent informatics</title><description>Inverse reinforcement learning (IRL) estimates a reward function for an agent to behave along with expert data, e.g., as human operation data. However, expert data usually have redundant parts, which decrease the agent’s performance. This study extends the IRL to sub-optimal action data, including lack and detour. The proposed method searches for new actions to determine optimal expert action data. This study adopted maze problems with sub-optimal expert action data to investigate the performance of the proposed method. The experimental results show that the proposed method finds optimal expert data better than the conventional method, and the proposed search mechanisms perform better than random search.</description><subject>Bias</subject><subject>Engineering</subject><subject>Entropy</subject><subject>Informatics</subject><subject>Methods</subject><issn>1343-0130</issn><issn>1883-8014</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNotUMtOwzAQtBBIVKU_wMkS55T1I45zbEsplSpVonC2XMcprtIk2CmPG7_B7_ElmJTTzq5md2YHoWsCYwq5SG_32jjnYkP5uAUm4QwNiJQskUD4ecSMswQIg0s0CmEPEDEVwMkAVcv6zfpg8aN1ddl4Yw-27vDKal-7eoffXfeCJ7s4Cz9f33jqdLAFnn-0VeN155oaT_tJBJvjNlm3nTvoCm_s6zHuuAgnpqfd6U5foYtSV8GO_usQPd_Pn2YPyWq9WM4mq8QwJrqEWc4zKglP85LSoky3TBcEQFiqU53GlwFYanJphCRE8MywNIMiK7cZaFlkbIhuTndb30QfoVP75ujrKKloLjIQPApEFj2xjG9C8LZUrY_m_acioPpg1SlY9Res6oNlv2_wbYU</recordid><startdate>20240301</startdate><enddate>20240301</enddate><creator>Uwano, Fumito</creator><creator>Hasegawa, Satoshi</creator><creator>Takadama, Keiki</creator><general>Fuji Technology Press Co. Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><orcidid>https://orcid.org/0000-0003-4139-2605</orcidid><orcidid>https://orcid.org/0009-0007-0916-5505</orcidid></search><sort><creationdate>20240301</creationdate><title>Inverse Reinforcement Learning with Agents’ Biased Exploration Based on Sub-Optimal Sequential Action Data</title><author>Uwano, Fumito ; Hasegawa, Satoshi ; Takadama, Keiki</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c336t-3e447281459f22df5b3ad1006e2a5a50960035c98c6811647c3570d7fb70a8d73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Bias</topic><topic>Engineering</topic><topic>Entropy</topic><topic>Informatics</topic><topic>Methods</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Uwano, Fumito</creatorcontrib><creatorcontrib>Hasegawa, Satoshi</creatorcontrib><creatorcontrib>Takadama, Keiki</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Journal of advanced computational intelligence and intelligent informatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Uwano, Fumito</au><au>Hasegawa, Satoshi</au><au>Takadama, Keiki</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Inverse Reinforcement Learning with Agents’ Biased Exploration Based on Sub-Optimal Sequential Action Data</atitle><jtitle>Journal of advanced computational intelligence and intelligent informatics</jtitle><date>2024-03-01</date><risdate>2024</risdate><volume>28</volume><issue>2</issue><spage>380</spage><epage>392</epage><pages>380-392</pages><issn>1343-0130</issn><eissn>1883-8014</eissn><abstract>Inverse reinforcement learning (IRL) estimates a reward function for an agent to behave along with expert data, e.g., as human operation data. However, expert data usually have redundant parts, which decrease the agent’s performance. This study extends the IRL to sub-optimal action data, including lack and detour. The proposed method searches for new actions to determine optimal expert action data. This study adopted maze problems with sub-optimal expert action data to investigate the performance of the proposed method. The experimental results show that the proposed method finds optimal expert data better than the conventional method, and the proposed search mechanisms perform better than random search.</abstract><cop>Tokyo</cop><pub>Fuji Technology Press Co. Ltd</pub><doi>10.20965/jaciii.2024.p0380</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0003-4139-2605</orcidid><orcidid>https://orcid.org/0009-0007-0916-5505</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1343-0130
ispartof Journal of advanced computational intelligence and intelligent informatics, 2024-03, Vol.28 (2), p.380-392
issn 1343-0130
1883-8014
language eng
recordid cdi_proquest_journals_2967064447
source DOAJ Directory of Open Access Journals
subjects Bias
Engineering
Entropy
Informatics
Methods
title Inverse Reinforcement Learning with Agents’ Biased Exploration Based on Sub-Optimal Sequential Action Data
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T21%3A49%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Inverse%20Reinforcement%20Learning%20with%20Agents%E2%80%99%20Biased%20Exploration%20Based%20on%20Sub-Optimal%20Sequential%20Action%20Data&rft.jtitle=Journal%20of%20advanced%20computational%20intelligence%20and%20intelligent%20informatics&rft.au=Uwano,%20Fumito&rft.date=2024-03-01&rft.volume=28&rft.issue=2&rft.spage=380&rft.epage=392&rft.pages=380-392&rft.issn=1343-0130&rft.eissn=1883-8014&rft_id=info:doi/10.20965/jaciii.2024.p0380&rft_dat=%3Cproquest_cross%3E2967064447%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c336t-3e447281459f22df5b3ad1006e2a5a50960035c98c6811647c3570d7fb70a8d73%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2967064447&rft_id=info:pmid/&rfr_iscdi=true