Loading…
Inverse Reinforcement Learning with Agents’ Biased Exploration Based on Sub-Optimal Sequential Action Data
Inverse reinforcement learning (IRL) estimates a reward function for an agent to behave along with expert data, e.g., as human operation data. However, expert data usually have redundant parts, which decrease the agent’s performance. This study extends the IRL to sub-optimal action data, including l...
Saved in:
Published in: | Journal of advanced computational intelligence and intelligent informatics 2024-03, Vol.28 (2), p.380-392 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c336t-3e447281459f22df5b3ad1006e2a5a50960035c98c6811647c3570d7fb70a8d73 |
container_end_page | 392 |
container_issue | 2 |
container_start_page | 380 |
container_title | Journal of advanced computational intelligence and intelligent informatics |
container_volume | 28 |
creator | Uwano, Fumito Hasegawa, Satoshi Takadama, Keiki |
description | Inverse reinforcement learning (IRL) estimates a reward function for an agent to behave along with expert data, e.g., as human operation data. However, expert data usually have redundant parts, which decrease the agent’s performance. This study extends the IRL to sub-optimal action data, including lack and detour. The proposed method searches for new actions to determine optimal expert action data. This study adopted maze problems with sub-optimal expert action data to investigate the performance of the proposed method. The experimental results show that the proposed method finds optimal expert data better than the conventional method, and the proposed search mechanisms perform better than random search. |
doi_str_mv | 10.20965/jaciii.2024.p0380 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2967064447</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2967064447</sourcerecordid><originalsourceid>FETCH-LOGICAL-c336t-3e447281459f22df5b3ad1006e2a5a50960035c98c6811647c3570d7fb70a8d73</originalsourceid><addsrcrecordid>eNotUMtOwzAQtBBIVKU_wMkS55T1I45zbEsplSpVonC2XMcprtIk2CmPG7_B7_ElmJTTzq5md2YHoWsCYwq5SG_32jjnYkP5uAUm4QwNiJQskUD4ecSMswQIg0s0CmEPEDEVwMkAVcv6zfpg8aN1ddl4Yw-27vDKal-7eoffXfeCJ7s4Cz9f33jqdLAFnn-0VeN155oaT_tJBJvjNlm3nTvoCm_s6zHuuAgnpqfd6U5foYtSV8GO_usQPd_Pn2YPyWq9WM4mq8QwJrqEWc4zKglP85LSoky3TBcEQFiqU53GlwFYanJphCRE8MywNIMiK7cZaFlkbIhuTndb30QfoVP75ujrKKloLjIQPApEFj2xjG9C8LZUrY_m_acioPpg1SlY9Res6oNlv2_wbYU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2967064447</pqid></control><display><type>article</type><title>Inverse Reinforcement Learning with Agents’ Biased Exploration Based on Sub-Optimal Sequential Action Data</title><source>DOAJ Directory of Open Access Journals</source><creator>Uwano, Fumito ; Hasegawa, Satoshi ; Takadama, Keiki</creator><creatorcontrib>Uwano, Fumito ; Hasegawa, Satoshi ; Takadama, Keiki</creatorcontrib><description>Inverse reinforcement learning (IRL) estimates a reward function for an agent to behave along with expert data, e.g., as human operation data. However, expert data usually have redundant parts, which decrease the agent’s performance. This study extends the IRL to sub-optimal action data, including lack and detour. The proposed method searches for new actions to determine optimal expert action data. This study adopted maze problems with sub-optimal expert action data to investigate the performance of the proposed method. The experimental results show that the proposed method finds optimal expert data better than the conventional method, and the proposed search mechanisms perform better than random search.</description><identifier>ISSN: 1343-0130</identifier><identifier>EISSN: 1883-8014</identifier><identifier>DOI: 10.20965/jaciii.2024.p0380</identifier><language>eng</language><publisher>Tokyo: Fuji Technology Press Co. Ltd</publisher><subject>Bias ; Engineering ; Entropy ; Informatics ; Methods</subject><ispartof>Journal of advanced computational intelligence and intelligent informatics, 2024-03, Vol.28 (2), p.380-392</ispartof><rights>Copyright © 2024 Fuji Technology Press Ltd.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c336t-3e447281459f22df5b3ad1006e2a5a50960035c98c6811647c3570d7fb70a8d73</cites><orcidid>0000-0003-4139-2605 ; 0009-0007-0916-5505</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,864,27923,27924</link.rule.ids></links><search><creatorcontrib>Uwano, Fumito</creatorcontrib><creatorcontrib>Hasegawa, Satoshi</creatorcontrib><creatorcontrib>Takadama, Keiki</creatorcontrib><title>Inverse Reinforcement Learning with Agents’ Biased Exploration Based on Sub-Optimal Sequential Action Data</title><title>Journal of advanced computational intelligence and intelligent informatics</title><description>Inverse reinforcement learning (IRL) estimates a reward function for an agent to behave along with expert data, e.g., as human operation data. However, expert data usually have redundant parts, which decrease the agent’s performance. This study extends the IRL to sub-optimal action data, including lack and detour. The proposed method searches for new actions to determine optimal expert action data. This study adopted maze problems with sub-optimal expert action data to investigate the performance of the proposed method. The experimental results show that the proposed method finds optimal expert data better than the conventional method, and the proposed search mechanisms perform better than random search.</description><subject>Bias</subject><subject>Engineering</subject><subject>Entropy</subject><subject>Informatics</subject><subject>Methods</subject><issn>1343-0130</issn><issn>1883-8014</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNotUMtOwzAQtBBIVKU_wMkS55T1I45zbEsplSpVonC2XMcprtIk2CmPG7_B7_ElmJTTzq5md2YHoWsCYwq5SG_32jjnYkP5uAUm4QwNiJQskUD4ecSMswQIg0s0CmEPEDEVwMkAVcv6zfpg8aN1ddl4Yw-27vDKal-7eoffXfeCJ7s4Cz9f33jqdLAFnn-0VeN155oaT_tJBJvjNlm3nTvoCm_s6zHuuAgnpqfd6U5foYtSV8GO_usQPd_Pn2YPyWq9WM4mq8QwJrqEWc4zKglP85LSoky3TBcEQFiqU53GlwFYanJphCRE8MywNIMiK7cZaFlkbIhuTndb30QfoVP75ujrKKloLjIQPApEFj2xjG9C8LZUrY_m_acioPpg1SlY9Res6oNlv2_wbYU</recordid><startdate>20240301</startdate><enddate>20240301</enddate><creator>Uwano, Fumito</creator><creator>Hasegawa, Satoshi</creator><creator>Takadama, Keiki</creator><general>Fuji Technology Press Co. Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><orcidid>https://orcid.org/0000-0003-4139-2605</orcidid><orcidid>https://orcid.org/0009-0007-0916-5505</orcidid></search><sort><creationdate>20240301</creationdate><title>Inverse Reinforcement Learning with Agents’ Biased Exploration Based on Sub-Optimal Sequential Action Data</title><author>Uwano, Fumito ; Hasegawa, Satoshi ; Takadama, Keiki</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c336t-3e447281459f22df5b3ad1006e2a5a50960035c98c6811647c3570d7fb70a8d73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Bias</topic><topic>Engineering</topic><topic>Entropy</topic><topic>Informatics</topic><topic>Methods</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Uwano, Fumito</creatorcontrib><creatorcontrib>Hasegawa, Satoshi</creatorcontrib><creatorcontrib>Takadama, Keiki</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Journal of advanced computational intelligence and intelligent informatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Uwano, Fumito</au><au>Hasegawa, Satoshi</au><au>Takadama, Keiki</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Inverse Reinforcement Learning with Agents’ Biased Exploration Based on Sub-Optimal Sequential Action Data</atitle><jtitle>Journal of advanced computational intelligence and intelligent informatics</jtitle><date>2024-03-01</date><risdate>2024</risdate><volume>28</volume><issue>2</issue><spage>380</spage><epage>392</epage><pages>380-392</pages><issn>1343-0130</issn><eissn>1883-8014</eissn><abstract>Inverse reinforcement learning (IRL) estimates a reward function for an agent to behave along with expert data, e.g., as human operation data. However, expert data usually have redundant parts, which decrease the agent’s performance. This study extends the IRL to sub-optimal action data, including lack and detour. The proposed method searches for new actions to determine optimal expert action data. This study adopted maze problems with sub-optimal expert action data to investigate the performance of the proposed method. The experimental results show that the proposed method finds optimal expert data better than the conventional method, and the proposed search mechanisms perform better than random search.</abstract><cop>Tokyo</cop><pub>Fuji Technology Press Co. Ltd</pub><doi>10.20965/jaciii.2024.p0380</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0003-4139-2605</orcidid><orcidid>https://orcid.org/0009-0007-0916-5505</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1343-0130 |
ispartof | Journal of advanced computational intelligence and intelligent informatics, 2024-03, Vol.28 (2), p.380-392 |
issn | 1343-0130 1883-8014 |
language | eng |
recordid | cdi_proquest_journals_2967064447 |
source | DOAJ Directory of Open Access Journals |
subjects | Bias Engineering Entropy Informatics Methods |
title | Inverse Reinforcement Learning with Agents’ Biased Exploration Based on Sub-Optimal Sequential Action Data |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T21%3A49%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Inverse%20Reinforcement%20Learning%20with%20Agents%E2%80%99%20Biased%20Exploration%20Based%20on%20Sub-Optimal%20Sequential%20Action%20Data&rft.jtitle=Journal%20of%20advanced%20computational%20intelligence%20and%20intelligent%20informatics&rft.au=Uwano,%20Fumito&rft.date=2024-03-01&rft.volume=28&rft.issue=2&rft.spage=380&rft.epage=392&rft.pages=380-392&rft.issn=1343-0130&rft.eissn=1883-8014&rft_id=info:doi/10.20965/jaciii.2024.p0380&rft_dat=%3Cproquest_cross%3E2967064447%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c336t-3e447281459f22df5b3ad1006e2a5a50960035c98c6811647c3570d7fb70a8d73%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2967064447&rft_id=info:pmid/&rfr_iscdi=true |