Loading…
More Human-Like Gameplay by Blending Policies from Supervised and Reinforcement Learning
Modeling human players' behaviors in games is a key challenge for making natural computer players, evaluating games, and generating content. To achieve better humancomputer interaction, researchers have tried various methods to create human-like AI. In chess and Go, supervised learning with dee...
Saved in:
Published in: | IEEE transactions on games 2024-07, p.1-13 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 13 |
container_issue | |
container_start_page | 1 |
container_title | IEEE transactions on games |
container_volume | |
creator | Ogawa, Tatsuyoshi Hsueh, Chu-Hsuan Ikeda, Kokolo |
description | Modeling human players' behaviors in games is a key challenge for making natural computer players, evaluating games, and generating content. To achieve better humancomputer interaction, researchers have tried various methods to create human-like AI. In chess and Go, supervised learning with deep neural networks is known as one of the most effective ways to predict human moves. However, for many other games (e.g., Shogi), it is hard to collect a similar amount of game records, resulting in poor move-matching accuracy of the supervised learning.We propose a method to compensate for the weakness of the supervised learning policy by Blending it with an AlphaZerolike reinforcement learning policy. Experiments on Shogi showed that the Blend method significantly improved the move-matching accuracy over supervised learning models. Experiments on chess and Go with a limited number of game records also showed similar results. The Blend method was effective with both medium and large numbers of games, particularly the medium case. We confirmed the robustness of the Blend model to the parameter and discussed the mechanism why the move-matching accuracy improves. In addition, we showed that the Blend model performed better than existing work that tried to improve the move-matching accuracy. |
doi_str_mv | 10.1109/TG.2024.3424668 |
format | article |
fullrecord | <record><control><sourceid>crossref_ieee_</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TG_2024_3424668</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10595450</ieee_id><sourcerecordid>10_1109_TG_2024_3424668</sourcerecordid><originalsourceid>FETCH-LOGICAL-c610-3d0f60427fbfd83650859a275db2787a9d2484ae8919dfc5453d0059081bb0be3</originalsourceid><addsrcrecordid>eNpNkEFLwzAYhoMoOObOXjzkD3T7kiZtetShnVBRtAdvJW2-SLRNR-KE_Xs7NkR44XsP3_MeHkKuGSwZg2JVl0sOXCxTwUWWqTMy4yKXCZMMzv868EuyiPETAPgUIdSMvD-NAelmN2ifVO4LaakH3PZ6T9s9vevRG-c_6MvYu85hpDaMA33bbTH8uIiGam_oKzpvx9DhgP6bVqiDn5grcmF1H3FxunNSP9zX601SPZeP69sq6TIGSWrAZiB4bltrVJpJULLQPJem5bnKdWG4UEKjKlhhbCeFnAiQBSjWttBiOier42wXxhgD2mYb3KDDvmHQHNQ0ddkc1DQnNRNxcyQcIv77lsW0DukvEL1e6A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>More Human-Like Gameplay by Blending Policies from Supervised and Reinforcement Learning</title><source>IEEE Xplore (Online service)</source><creator>Ogawa, Tatsuyoshi ; Hsueh, Chu-Hsuan ; Ikeda, Kokolo</creator><creatorcontrib>Ogawa, Tatsuyoshi ; Hsueh, Chu-Hsuan ; Ikeda, Kokolo</creatorcontrib><description>Modeling human players' behaviors in games is a key challenge for making natural computer players, evaluating games, and generating content. To achieve better humancomputer interaction, researchers have tried various methods to create human-like AI. In chess and Go, supervised learning with deep neural networks is known as one of the most effective ways to predict human moves. However, for many other games (e.g., Shogi), it is hard to collect a similar amount of game records, resulting in poor move-matching accuracy of the supervised learning.We propose a method to compensate for the weakness of the supervised learning policy by Blending it with an AlphaZerolike reinforcement learning policy. Experiments on Shogi showed that the Blend method significantly improved the move-matching accuracy over supervised learning models. Experiments on chess and Go with a limited number of game records also showed similar results. The Blend method was effective with both medium and large numbers of games, particularly the medium case. We confirmed the robustness of the Blend model to the parameter and discussed the mechanism why the move-matching accuracy improves. In addition, we showed that the Blend model performed better than existing work that tried to improve the move-matching accuracy.</description><identifier>ISSN: 2475-1502</identifier><identifier>EISSN: 2475-1510</identifier><identifier>DOI: 10.1109/TG.2024.3424668</identifier><identifier>CODEN: ITGEBS</identifier><language>eng</language><publisher>IEEE</publisher><subject>Accuracy ; Artificial intelligence ; Board Game ; Games ; Human-Likeness ; Neural networks ; Player Modeling ; Predictive models ; Reinforcement learning ; Supervised learning</subject><ispartof>IEEE transactions on games, 2024-07, p.1-13</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10595450$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Ogawa, Tatsuyoshi</creatorcontrib><creatorcontrib>Hsueh, Chu-Hsuan</creatorcontrib><creatorcontrib>Ikeda, Kokolo</creatorcontrib><title>More Human-Like Gameplay by Blending Policies from Supervised and Reinforcement Learning</title><title>IEEE transactions on games</title><addtitle>TG</addtitle><description>Modeling human players' behaviors in games is a key challenge for making natural computer players, evaluating games, and generating content. To achieve better humancomputer interaction, researchers have tried various methods to create human-like AI. In chess and Go, supervised learning with deep neural networks is known as one of the most effective ways to predict human moves. However, for many other games (e.g., Shogi), it is hard to collect a similar amount of game records, resulting in poor move-matching accuracy of the supervised learning.We propose a method to compensate for the weakness of the supervised learning policy by Blending it with an AlphaZerolike reinforcement learning policy. Experiments on Shogi showed that the Blend method significantly improved the move-matching accuracy over supervised learning models. Experiments on chess and Go with a limited number of game records also showed similar results. The Blend method was effective with both medium and large numbers of games, particularly the medium case. We confirmed the robustness of the Blend model to the parameter and discussed the mechanism why the move-matching accuracy improves. In addition, we showed that the Blend model performed better than existing work that tried to improve the move-matching accuracy.</description><subject>Accuracy</subject><subject>Artificial intelligence</subject><subject>Board Game</subject><subject>Games</subject><subject>Human-Likeness</subject><subject>Neural networks</subject><subject>Player Modeling</subject><subject>Predictive models</subject><subject>Reinforcement learning</subject><subject>Supervised learning</subject><issn>2475-1502</issn><issn>2475-1510</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><recordid>eNpNkEFLwzAYhoMoOObOXjzkD3T7kiZtetShnVBRtAdvJW2-SLRNR-KE_Xs7NkR44XsP3_MeHkKuGSwZg2JVl0sOXCxTwUWWqTMy4yKXCZMMzv868EuyiPETAPgUIdSMvD-NAelmN2ifVO4LaakH3PZ6T9s9vevRG-c_6MvYu85hpDaMA33bbTH8uIiGam_oKzpvx9DhgP6bVqiDn5grcmF1H3FxunNSP9zX601SPZeP69sq6TIGSWrAZiB4bltrVJpJULLQPJem5bnKdWG4UEKjKlhhbCeFnAiQBSjWttBiOier42wXxhgD2mYb3KDDvmHQHNQ0ddkc1DQnNRNxcyQcIv77lsW0DukvEL1e6A</recordid><startdate>20240710</startdate><enddate>20240710</enddate><creator>Ogawa, Tatsuyoshi</creator><creator>Hsueh, Chu-Hsuan</creator><creator>Ikeda, Kokolo</creator><general>IEEE</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20240710</creationdate><title>More Human-Like Gameplay by Blending Policies from Supervised and Reinforcement Learning</title><author>Ogawa, Tatsuyoshi ; Hsueh, Chu-Hsuan ; Ikeda, Kokolo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c610-3d0f60427fbfd83650859a275db2787a9d2484ae8919dfc5453d0059081bb0be3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Artificial intelligence</topic><topic>Board Game</topic><topic>Games</topic><topic>Human-Likeness</topic><topic>Neural networks</topic><topic>Player Modeling</topic><topic>Predictive models</topic><topic>Reinforcement learning</topic><topic>Supervised learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ogawa, Tatsuyoshi</creatorcontrib><creatorcontrib>Hsueh, Chu-Hsuan</creatorcontrib><creatorcontrib>Ikeda, Kokolo</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore (Online service)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on games</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ogawa, Tatsuyoshi</au><au>Hsueh, Chu-Hsuan</au><au>Ikeda, Kokolo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>More Human-Like Gameplay by Blending Policies from Supervised and Reinforcement Learning</atitle><jtitle>IEEE transactions on games</jtitle><stitle>TG</stitle><date>2024-07-10</date><risdate>2024</risdate><spage>1</spage><epage>13</epage><pages>1-13</pages><issn>2475-1502</issn><eissn>2475-1510</eissn><coden>ITGEBS</coden><abstract>Modeling human players' behaviors in games is a key challenge for making natural computer players, evaluating games, and generating content. To achieve better humancomputer interaction, researchers have tried various methods to create human-like AI. In chess and Go, supervised learning with deep neural networks is known as one of the most effective ways to predict human moves. However, for many other games (e.g., Shogi), it is hard to collect a similar amount of game records, resulting in poor move-matching accuracy of the supervised learning.We propose a method to compensate for the weakness of the supervised learning policy by Blending it with an AlphaZerolike reinforcement learning policy. Experiments on Shogi showed that the Blend method significantly improved the move-matching accuracy over supervised learning models. Experiments on chess and Go with a limited number of game records also showed similar results. The Blend method was effective with both medium and large numbers of games, particularly the medium case. We confirmed the robustness of the Blend model to the parameter and discussed the mechanism why the move-matching accuracy improves. In addition, we showed that the Blend model performed better than existing work that tried to improve the move-matching accuracy.</abstract><pub>IEEE</pub><doi>10.1109/TG.2024.3424668</doi><tpages>13</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2475-1502 |
ispartof | IEEE transactions on games, 2024-07, p.1-13 |
issn | 2475-1502 2475-1510 |
language | eng |
recordid | cdi_crossref_primary_10_1109_TG_2024_3424668 |
source | IEEE Xplore (Online service) |
subjects | Accuracy Artificial intelligence Board Game Games Human-Likeness Neural networks Player Modeling Predictive models Reinforcement learning Supervised learning |
title | More Human-Like Gameplay by Blending Policies from Supervised and Reinforcement Learning |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T00%3A11%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=More%20Human-Like%20Gameplay%20by%20Blending%20Policies%20from%20Supervised%20and%20Reinforcement%20Learning&rft.jtitle=IEEE%20transactions%20on%20games&rft.au=Ogawa,%20Tatsuyoshi&rft.date=2024-07-10&rft.spage=1&rft.epage=13&rft.pages=1-13&rft.issn=2475-1502&rft.eissn=2475-1510&rft.coden=ITGEBS&rft_id=info:doi/10.1109/TG.2024.3424668&rft_dat=%3Ccrossref_ieee_%3E10_1109_TG_2024_3424668%3C/crossref_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c610-3d0f60427fbfd83650859a275db2787a9d2484ae8919dfc5453d0059081bb0be3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10595450&rfr_iscdi=true |