Loading…

Gumbel MuZero for the Game of 2048

In recent years, AlphaZero and MuZero have achieved remarkable success in a broad range of applications. AlphaZero masters playing without human knowledge, while MuZero also learns the game rules and environment's dynamics without the access to a simulator during planning, which makes it applic...

Full description

Saved in:
Bibliographic Details
Main Authors: Kao, Chih-Yu, Guei, Hung, Wu, Ti-Rong, Wu, I-Chen
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In recent years, AlphaZero and MuZero have achieved remarkable success in a broad range of applications. AlphaZero masters playing without human knowledge, while MuZero also learns the game rules and environment's dynamics without the access to a simulator during planning, which makes it applicable to complex environments. Both algorithms adopt Monte Carlo tree search (MCTS) during self-play, usually using hundreds of simulations for one move. For stochasticity, Stochastic MuZero was proposed to learn a stochastic model and uses the learned model to perform the tree search. Recently, Gumbel MuZero was proposed to ensure the policy improvement and can thus learn reliably with a small number of simulations. However, Gumbel MuZero used a deterministic model as in MuZero, limiting its performance in stochastic environments. In this paper, we propose to combine Gumbel MuZero and Stochastic MuZero, the first attempt to apply Gumbel MuZero to a stochastic environment. Our experiment on the stochastic puzzle game 2048 demonstrates that the combined algorithm can perform well and achieve an average score of 394,645 with only 3 simulations during training, greatly reducing the computational resource needed for training.
ISSN:2376-6824
DOI:10.1109/TAAI57707.2022.00017