Loading…

More Human-Like Gameplay by Blending Policies from Supervised and Reinforcement Learning

Modeling human players' behaviors in games is a key challenge for making natural computer players, evaluating games, and generating content. To achieve better humancomputer interaction, researchers have tried various methods to create human-like AI. In chess and Go, supervised learning with dee...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on games 2024-07, p.1-13
Main Authors:	Ogawa, Tatsuyoshi, Hsueh, Chu-Hsuan, Ikeda, Kokolo
Format:	Article
Language:	English
Subjects:	Accuracy Artificial intelligence Board Game Games Human-Likeness Neural networks Player Modeling Predictive models Reinforcement learning Supervised learning
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	13
container_issue
container_start_page	1
container_title	IEEE transactions on games
container_volume
creator	Ogawa, Tatsuyoshi Hsueh, Chu-Hsuan Ikeda, Kokolo
description	Modeling human players' behaviors in games is a key challenge for making natural computer players, evaluating games, and generating content. To achieve better humancomputer interaction, researchers have tried various methods to create human-like AI. In chess and Go, supervised learning with deep neural networks is known as one of the most effective ways to predict human moves. However, for many other games (e.g., Shogi), it is hard to collect a similar amount of game records, resulting in poor move-matching accuracy of the supervised learning.We propose a method to compensate for the weakness of the supervised learning policy by Blending it with an AlphaZerolike reinforcement learning policy. Experiments on Shogi showed that the Blend method significantly improved the move-matching accuracy over supervised learning models. Experiments on chess and Go with a limited number of game records also showed similar results. The Blend method was effective with both medium and large numbers of games, particularly the medium case. We confirmed the robustness of the Blend model to the parameter and discussed the mechanism why the move-matching accuracy improves. In addition, we showed that the Blend model performed better than existing work that tried to improve the move-matching accuracy.
doi_str_mv	10.1109/TG.2024.3424668
format	article
fullrecord	<record><control><sourceid>crossref_ieee_</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TG_2024_3424668</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10595450</ieee_id><sourcerecordid>10_1109_TG_2024_3424668</sourcerecordid><originalsourceid>FETCH-LOGICAL-c610-3d0f60427fbfd83650859a275db2787a9d2484ae8919dfc5453d0059081bb0be3</originalsourceid><addsrcrecordid>eNpNkEFLwzAYhoMoOObOXjzkD3T7kiZtetShnVBRtAdvJW2-SLRNR-KE_Xs7NkR44XsP3_MeHkKuGSwZg2JVl0sOXCxTwUWWqTMy4yKXCZMMzv868EuyiPETAPgUIdSMvD-NAelmN2ifVO4LaakH3PZ6T9s9vevRG-c_6MvYu85hpDaMA33bbTH8uIiGam_oKzpvx9DhgP6bVqiDn5grcmF1H3FxunNSP9zX601SPZeP69sq6TIGSWrAZiB4bltrVJpJULLQPJem5bnKdWG4UEKjKlhhbCeFnAiQBSjWttBiOier42wXxhgD2mYb3KDDvmHQHNQ0ddkc1DQnNRNxcyQcIv77lsW0DukvEL1e6A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>More Human-Like Gameplay by Blending Policies from Supervised and Reinforcement Learning</title><source>IEEE Xplore (Online service)</source><creator>Ogawa, Tatsuyoshi ; Hsueh, Chu-Hsuan ; Ikeda, Kokolo</creator><creatorcontrib>Ogawa, Tatsuyoshi ; Hsueh, Chu-Hsuan ; Ikeda, Kokolo</creatorcontrib><description>Modeling human players' behaviors in games is a key challenge for making natural computer players, evaluating games, and generating content. To achieve better humancomputer interaction, researchers have tried various methods to create human-like AI. In chess and Go, supervised learning with deep neural networks is known as one of the most effective ways to predict human moves. However, for many other games (e.g., Shogi), it is hard to collect a similar amount of game records, resulting in poor move-matching accuracy of the supervised learning.We propose a method to compensate for the weakness of the supervised learning policy by Blending it with an AlphaZerolike reinforcement learning policy. Experiments on Shogi showed that the Blend method significantly improved the move-matching accuracy over supervised learning models. Experiments on chess and Go with a limited number of game records also showed similar results. The Blend method was effective with both medium and large numbers of games, particularly the medium case. We confirmed the robustness of the Blend model to the parameter and discussed the mechanism why the move-matching accuracy improves. In addition, we showed that the Blend model performed better than existing work that tried to improve the move-matching accuracy.</description><identifier>ISSN: 2475-1502</identifier><identifier>EISSN: 2475-1510</identifier><identifier>DOI: 10.1109/TG.2024.3424668</identifier><identifier>CODEN: ITGEBS</identifier><language>eng</language><publisher>IEEE</publisher><subject>Accuracy ; Artificial intelligence ; Board Game ; Games ; Human-Likeness ; Neural networks ; Player Modeling ; Predictive models ; Reinforcement learning ; Supervised learning</subject><ispartof>IEEE transactions on games, 2024-07, p.1-13</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10595450$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Ogawa, Tatsuyoshi</creatorcontrib><creatorcontrib>Hsueh, Chu-Hsuan</creatorcontrib><creatorcontrib>Ikeda, Kokolo</creatorcontrib><title>More Human-Like Gameplay by Blending Policies from Supervised and Reinforcement Learning</title><title>IEEE transactions on games</title><addtitle>TG</addtitle><description>Modeling human players' behaviors in games is a key challenge for making natural computer players, evaluating games, and generating content. To achieve better humancomputer interaction, researchers have tried various methods to create human-like AI. In chess and Go, supervised learning with deep neural networks is known as one of the most effective ways to predict human moves. However, for many other games (e.g., Shogi), it is hard to collect a similar amount of game records, resulting in poor move-matching accuracy of the supervised learning.We propose a method to compensate for the weakness of the supervised learning policy by Blending it with an AlphaZerolike reinforcement learning policy. Experiments on Shogi showed that the Blend method significantly improved the move-matching accuracy over supervised learning models. Experiments on chess and Go with a limited number of game records also showed similar results. The Blend method was effective with both medium and large numbers of games, particularly the medium case. We confirmed the robustness of the Blend model to the parameter and discussed the mechanism why the move-matching accuracy improves. In addition, we showed that the Blend model performed better than existing work that tried to improve the move-matching accuracy.</description><subject>Accuracy</subject><subject>Artificial intelligence</subject><subject>Board Game</subject><subject>Games</subject><subject>Human-Likeness</subject><subject>Neural networks</subject><subject>Player Modeling</subject><subject>Predictive models</subject><subject>Reinforcement learning</subject><subject>Supervised learning</subject><issn>2475-1502</issn><issn>2475-1510</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><recordid>eNpNkEFLwzAYhoMoOObOXjzkD3T7kiZtetShnVBRtAdvJW2-SLRNR-KE_Xs7NkR44XsP3_MeHkKuGSwZg2JVl0sOXCxTwUWWqTMy4yKXCZMMzv868EuyiPETAPgUIdSMvD-NAelmN2ifVO4LaakH3PZ6T9s9vevRG-c_6MvYu85hpDaMA33bbTH8uIiGam_oKzpvx9DhgP6bVqiDn5grcmF1H3FxunNSP9zX601SPZeP69sq6TIGSWrAZiB4bltrVJpJULLQPJem5bnKdWG4UEKjKlhhbCeFnAiQBSjWttBiOier42wXxhgD2mYb3KDDvmHQHNQ0ddkc1DQnNRNxcyQcIv77lsW0DukvEL1e6A</recordid><startdate>20240710</startdate><enddate>20240710</enddate><creator>Ogawa, Tatsuyoshi</creator><creator>Hsueh, Chu-Hsuan</creator><creator>Ikeda, Kokolo</creator><general>IEEE</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20240710</creationdate><title>More Human-Like Gameplay by Blending Policies from Supervised and Reinforcement Learning</title><author>Ogawa, Tatsuyoshi ; Hsueh, Chu-Hsuan ; Ikeda, Kokolo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c610-3d0f60427fbfd83650859a275db2787a9d2484ae8919dfc5453d0059081bb0be3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Artificial intelligence</topic><topic>Board Game</topic><topic>Games</topic><topic>Human-Likeness</topic><topic>Neural networks</topic><topic>Player Modeling</topic><topic>Predictive models</topic><topic>Reinforcement learning</topic><topic>Supervised learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ogawa, Tatsuyoshi</creatorcontrib><creatorcontrib>Hsueh, Chu-Hsuan</creatorcontrib><creatorcontrib>Ikeda, Kokolo</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore (Online service)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on games</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ogawa, Tatsuyoshi</au><au>Hsueh, Chu-Hsuan</au><au>Ikeda, Kokolo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>More Human-Like Gameplay by Blending Policies from Supervised and Reinforcement Learning</atitle><jtitle>IEEE transactions on games</jtitle><stitle>TG</stitle><date>2024-07-10</date><risdate>2024</risdate><spage>1</spage><epage>13</epage><pages>1-13</pages><issn>2475-1502</issn><eissn>2475-1510</eissn><coden>ITGEBS</coden><abstract>Modeling human players' behaviors in games is a key challenge for making natural computer players, evaluating games, and generating content. To achieve better humancomputer interaction, researchers have tried various methods to create human-like AI. In chess and Go, supervised learning with deep neural networks is known as one of the most effective ways to predict human moves. However, for many other games (e.g., Shogi), it is hard to collect a similar amount of game records, resulting in poor move-matching accuracy of the supervised learning.We propose a method to compensate for the weakness of the supervised learning policy by Blending it with an AlphaZerolike reinforcement learning policy. Experiments on Shogi showed that the Blend method significantly improved the move-matching accuracy over supervised learning models. Experiments on chess and Go with a limited number of game records also showed similar results. The Blend method was effective with both medium and large numbers of games, particularly the medium case. We confirmed the robustness of the Blend model to the parameter and discussed the mechanism why the move-matching accuracy improves. In addition, we showed that the Blend model performed better than existing work that tried to improve the move-matching accuracy.</abstract><pub>IEEE</pub><doi>10.1109/TG.2024.3424668</doi><tpages>13</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2475-1502
ispartof	IEEE transactions on games, 2024-07, p.1-13
issn	2475-1502 2475-1510
language	eng
recordid	cdi_crossref_primary_10_1109_TG_2024_3424668
source	IEEE Xplore (Online service)
subjects	Accuracy Artificial intelligence Board Game Games Human-Likeness Neural networks Player Modeling Predictive models Reinforcement learning Supervised learning
title	More Human-Like Gameplay by Blending Policies from Supervised and Reinforcement Learning
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T00%3A11%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=More%20Human-Like%20Gameplay%20by%20Blending%20Policies%20from%20Supervised%20and%20Reinforcement%20Learning&rft.jtitle=IEEE%20transactions%20on%20games&rft.au=Ogawa,%20Tatsuyoshi&rft.date=2024-07-10&rft.spage=1&rft.epage=13&rft.pages=1-13&rft.issn=2475-1502&rft.eissn=2475-1510&rft.coden=ITGEBS&rft_id=info:doi/10.1109/TG.2024.3424668&rft_dat=%3Ccrossref_ieee_%3E10_1109_TG_2024_3424668%3C/crossref_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c610-3d0f60427fbfd83650859a275db2787a9d2484ae8919dfc5453d0059081bb0be3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10595450&rfr_iscdi=true