Loading…

More Human-Like Gameplay by Blending Policies from Supervised and Reinforcement Learning

Modeling human players' behaviors in games is a key challenge for making natural computer players, evaluating games, and generating content. To achieve better humancomputer interaction, researchers have tried various methods to create human-like AI. In chess and Go, supervised learning with dee...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on games 2024-07, p.1-13
Main Authors:	Ogawa, Tatsuyoshi, Hsueh, Chu-Hsuan, Ikeda, Kokolo
Format:	Article
Language:	English
Subjects:	Accuracy Artificial intelligence Board Game Games Human-Likeness Neural networks Player Modeling Predictive models Reinforcement learning Supervised learning
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Modeling human players' behaviors in games is a key challenge for making natural computer players, evaluating games, and generating content. To achieve better humancomputer interaction, researchers have tried various methods to create human-like AI. In chess and Go, supervised learning with deep neural networks is known as one of the most effective ways to predict human moves. However, for many other games (e.g., Shogi), it is hard to collect a similar amount of game records, resulting in poor move-matching accuracy of the supervised learning.We propose a method to compensate for the weakness of the supervised learning policy by Blending it with an AlphaZerolike reinforcement learning policy. Experiments on Shogi showed that the Blend method significantly improved the move-matching accuracy over supervised learning models. Experiments on chess and Go with a limited number of game records also showed similar results. The Blend method was effective with both medium and large numbers of games, particularly the medium case. We confirmed the robustness of the Blend model to the parameter and discussed the mechanism why the move-matching accuracy improves. In addition, we showed that the Blend model performed better than existing work that tried to improve the move-matching accuracy.
ISSN:	2475-1502 2475-1510
DOI:	10.1109/TG.2024.3424668