Loading…

More Human-Like Gameplay by Blending Policies from Supervised and Reinforcement Learning

Modeling human players' behaviors in games is a key challenge for making natural computer players, evaluating games, and generating content. To achieve better humancomputer interaction, researchers have tried various methods to create human-like AI. In chess and Go, supervised learning with dee...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on games 2024-07, p.1-13
Main Authors: Ogawa, Tatsuyoshi, Hsueh, Chu-Hsuan, Ikeda, Kokolo
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 13
container_issue
container_start_page 1
container_title IEEE transactions on games
container_volume
creator Ogawa, Tatsuyoshi
Hsueh, Chu-Hsuan
Ikeda, Kokolo
description Modeling human players' behaviors in games is a key challenge for making natural computer players, evaluating games, and generating content. To achieve better humancomputer interaction, researchers have tried various methods to create human-like AI. In chess and Go, supervised learning with deep neural networks is known as one of the most effective ways to predict human moves. However, for many other games (e.g., Shogi), it is hard to collect a similar amount of game records, resulting in poor move-matching accuracy of the supervised learning.We propose a method to compensate for the weakness of the supervised learning policy by Blending it with an AlphaZerolike reinforcement learning policy. Experiments on Shogi showed that the Blend method significantly improved the move-matching accuracy over supervised learning models. Experiments on chess and Go with a limited number of game records also showed similar results. The Blend method was effective with both medium and large numbers of games, particularly the medium case. We confirmed the robustness of the Blend model to the parameter and discussed the mechanism why the move-matching accuracy improves. In addition, we showed that the Blend model performed better than existing work that tried to improve the move-matching accuracy.
doi_str_mv 10.1109/TG.2024.3424668
format article
fullrecord <record><control><sourceid>crossref_ieee_</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TG_2024_3424668</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10595450</ieee_id><sourcerecordid>10_1109_TG_2024_3424668</sourcerecordid><originalsourceid>FETCH-LOGICAL-c610-3d0f60427fbfd83650859a275db2787a9d2484ae8919dfc5453d0059081bb0be3</originalsourceid><addsrcrecordid>eNpNkEFLwzAYhoMoOObOXjzkD3T7kiZtetShnVBRtAdvJW2-SLRNR-KE_Xs7NkR44XsP3_MeHkKuGSwZg2JVl0sOXCxTwUWWqTMy4yKXCZMMzv868EuyiPETAPgUIdSMvD-NAelmN2ifVO4LaakH3PZ6T9s9vevRG-c_6MvYu85hpDaMA33bbTH8uIiGam_oKzpvx9DhgP6bVqiDn5grcmF1H3FxunNSP9zX601SPZeP69sq6TIGSWrAZiB4bltrVJpJULLQPJem5bnKdWG4UEKjKlhhbCeFnAiQBSjWttBiOier42wXxhgD2mYb3KDDvmHQHNQ0ddkc1DQnNRNxcyQcIv77lsW0DukvEL1e6A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>More Human-Like Gameplay by Blending Policies from Supervised and Reinforcement Learning</title><source>IEEE Xplore (Online service)</source><creator>Ogawa, Tatsuyoshi ; Hsueh, Chu-Hsuan ; Ikeda, Kokolo</creator><creatorcontrib>Ogawa, Tatsuyoshi ; Hsueh, Chu-Hsuan ; Ikeda, Kokolo</creatorcontrib><description>Modeling human players' behaviors in games is a key challenge for making natural computer players, evaluating games, and generating content. To achieve better humancomputer interaction, researchers have tried various methods to create human-like AI. In chess and Go, supervised learning with deep neural networks is known as one of the most effective ways to predict human moves. However, for many other games (e.g., Shogi), it is hard to collect a similar amount of game records, resulting in poor move-matching accuracy of the supervised learning.We propose a method to compensate for the weakness of the supervised learning policy by Blending it with an AlphaZerolike reinforcement learning policy. Experiments on Shogi showed that the Blend method significantly improved the move-matching accuracy over supervised learning models. Experiments on chess and Go with a limited number of game records also showed similar results. The Blend method was effective with both medium and large numbers of games, particularly the medium case. We confirmed the robustness of the Blend model to the parameter and discussed the mechanism why the move-matching accuracy improves. In addition, we showed that the Blend model performed better than existing work that tried to improve the move-matching accuracy.</description><identifier>ISSN: 2475-1502</identifier><identifier>EISSN: 2475-1510</identifier><identifier>DOI: 10.1109/TG.2024.3424668</identifier><identifier>CODEN: ITGEBS</identifier><language>eng</language><publisher>IEEE</publisher><subject>Accuracy ; Artificial intelligence ; Board Game ; Games ; Human-Likeness ; Neural networks ; Player Modeling ; Predictive models ; Reinforcement learning ; Supervised learning</subject><ispartof>IEEE transactions on games, 2024-07, p.1-13</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10595450$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Ogawa, Tatsuyoshi</creatorcontrib><creatorcontrib>Hsueh, Chu-Hsuan</creatorcontrib><creatorcontrib>Ikeda, Kokolo</creatorcontrib><title>More Human-Like Gameplay by Blending Policies from Supervised and Reinforcement Learning</title><title>IEEE transactions on games</title><addtitle>TG</addtitle><description>Modeling human players' behaviors in games is a key challenge for making natural computer players, evaluating games, and generating content. To achieve better humancomputer interaction, researchers have tried various methods to create human-like AI. In chess and Go, supervised learning with deep neural networks is known as one of the most effective ways to predict human moves. However, for many other games (e.g., Shogi), it is hard to collect a similar amount of game records, resulting in poor move-matching accuracy of the supervised learning.We propose a method to compensate for the weakness of the supervised learning policy by Blending it with an AlphaZerolike reinforcement learning policy. Experiments on Shogi showed that the Blend method significantly improved the move-matching accuracy over supervised learning models. Experiments on chess and Go with a limited number of game records also showed similar results. The Blend method was effective with both medium and large numbers of games, particularly the medium case. We confirmed the robustness of the Blend model to the parameter and discussed the mechanism why the move-matching accuracy improves. In addition, we showed that the Blend model performed better than existing work that tried to improve the move-matching accuracy.</description><subject>Accuracy</subject><subject>Artificial intelligence</subject><subject>Board Game</subject><subject>Games</subject><subject>Human-Likeness</subject><subject>Neural networks</subject><subject>Player Modeling</subject><subject>Predictive models</subject><subject>Reinforcement learning</subject><subject>Supervised learning</subject><issn>2475-1502</issn><issn>2475-1510</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><recordid>eNpNkEFLwzAYhoMoOObOXjzkD3T7kiZtetShnVBRtAdvJW2-SLRNR-KE_Xs7NkR44XsP3_MeHkKuGSwZg2JVl0sOXCxTwUWWqTMy4yKXCZMMzv868EuyiPETAPgUIdSMvD-NAelmN2ifVO4LaakH3PZ6T9s9vevRG-c_6MvYu85hpDaMA33bbTH8uIiGam_oKzpvx9DhgP6bVqiDn5grcmF1H3FxunNSP9zX601SPZeP69sq6TIGSWrAZiB4bltrVJpJULLQPJem5bnKdWG4UEKjKlhhbCeFnAiQBSjWttBiOier42wXxhgD2mYb3KDDvmHQHNQ0ddkc1DQnNRNxcyQcIv77lsW0DukvEL1e6A</recordid><startdate>20240710</startdate><enddate>20240710</enddate><creator>Ogawa, Tatsuyoshi</creator><creator>Hsueh, Chu-Hsuan</creator><creator>Ikeda, Kokolo</creator><general>IEEE</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20240710</creationdate><title>More Human-Like Gameplay by Blending Policies from Supervised and Reinforcement Learning</title><author>Ogawa, Tatsuyoshi ; Hsueh, Chu-Hsuan ; Ikeda, Kokolo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c610-3d0f60427fbfd83650859a275db2787a9d2484ae8919dfc5453d0059081bb0be3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Artificial intelligence</topic><topic>Board Game</topic><topic>Games</topic><topic>Human-Likeness</topic><topic>Neural networks</topic><topic>Player Modeling</topic><topic>Predictive models</topic><topic>Reinforcement learning</topic><topic>Supervised learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ogawa, Tatsuyoshi</creatorcontrib><creatorcontrib>Hsueh, Chu-Hsuan</creatorcontrib><creatorcontrib>Ikeda, Kokolo</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore (Online service)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on games</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ogawa, Tatsuyoshi</au><au>Hsueh, Chu-Hsuan</au><au>Ikeda, Kokolo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>More Human-Like Gameplay by Blending Policies from Supervised and Reinforcement Learning</atitle><jtitle>IEEE transactions on games</jtitle><stitle>TG</stitle><date>2024-07-10</date><risdate>2024</risdate><spage>1</spage><epage>13</epage><pages>1-13</pages><issn>2475-1502</issn><eissn>2475-1510</eissn><coden>ITGEBS</coden><abstract>Modeling human players' behaviors in games is a key challenge for making natural computer players, evaluating games, and generating content. To achieve better humancomputer interaction, researchers have tried various methods to create human-like AI. In chess and Go, supervised learning with deep neural networks is known as one of the most effective ways to predict human moves. However, for many other games (e.g., Shogi), it is hard to collect a similar amount of game records, resulting in poor move-matching accuracy of the supervised learning.We propose a method to compensate for the weakness of the supervised learning policy by Blending it with an AlphaZerolike reinforcement learning policy. Experiments on Shogi showed that the Blend method significantly improved the move-matching accuracy over supervised learning models. Experiments on chess and Go with a limited number of game records also showed similar results. The Blend method was effective with both medium and large numbers of games, particularly the medium case. We confirmed the robustness of the Blend model to the parameter and discussed the mechanism why the move-matching accuracy improves. In addition, we showed that the Blend model performed better than existing work that tried to improve the move-matching accuracy.</abstract><pub>IEEE</pub><doi>10.1109/TG.2024.3424668</doi><tpages>13</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2475-1502
ispartof IEEE transactions on games, 2024-07, p.1-13
issn 2475-1502
2475-1510
language eng
recordid cdi_crossref_primary_10_1109_TG_2024_3424668
source IEEE Xplore (Online service)
subjects Accuracy
Artificial intelligence
Board Game
Games
Human-Likeness
Neural networks
Player Modeling
Predictive models
Reinforcement learning
Supervised learning
title More Human-Like Gameplay by Blending Policies from Supervised and Reinforcement Learning
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T00%3A11%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=More%20Human-Like%20Gameplay%20by%20Blending%20Policies%20from%20Supervised%20and%20Reinforcement%20Learning&rft.jtitle=IEEE%20transactions%20on%20games&rft.au=Ogawa,%20Tatsuyoshi&rft.date=2024-07-10&rft.spage=1&rft.epage=13&rft.pages=1-13&rft.issn=2475-1502&rft.eissn=2475-1510&rft.coden=ITGEBS&rft_id=info:doi/10.1109/TG.2024.3424668&rft_dat=%3Ccrossref_ieee_%3E10_1109_TG_2024_3424668%3C/crossref_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c610-3d0f60427fbfd83650859a275db2787a9d2484ae8919dfc5453d0059081bb0be3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10595450&rfr_iscdi=true