Loading…
More Human-Like Gameplay by Blending Policies from Supervised and Reinforcement Learning
Modeling human players' behaviors in games is a key challenge for making natural computer players, evaluating games, and generating content. To achieve better humancomputer interaction, researchers have tried various methods to create human-like AI. In chess and Go, supervised learning with dee...
Saved in:
Published in: | IEEE transactions on games 2024-07, p.1-13 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Modeling human players' behaviors in games is a key challenge for making natural computer players, evaluating games, and generating content. To achieve better humancomputer interaction, researchers have tried various methods to create human-like AI. In chess and Go, supervised learning with deep neural networks is known as one of the most effective ways to predict human moves. However, for many other games (e.g., Shogi), it is hard to collect a similar amount of game records, resulting in poor move-matching accuracy of the supervised learning.We propose a method to compensate for the weakness of the supervised learning policy by Blending it with an AlphaZerolike reinforcement learning policy. Experiments on Shogi showed that the Blend method significantly improved the move-matching accuracy over supervised learning models. Experiments on chess and Go with a limited number of game records also showed similar results. The Blend method was effective with both medium and large numbers of games, particularly the medium case. We confirmed the robustness of the Blend model to the parameter and discussed the mechanism why the move-matching accuracy improves. In addition, we showed that the Blend model performed better than existing work that tried to improve the move-matching accuracy. |
---|---|
ISSN: | 2475-1502 2475-1510 |
DOI: | 10.1109/TG.2024.3424668 |