Loading…
Gumbel MuZero for the Game of 2048
In recent years, AlphaZero and MuZero have achieved remarkable success in a broad range of applications. AlphaZero masters playing without human knowledge, while MuZero also learns the game rules and environment's dynamics without the access to a simulator during planning, which makes it applic...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In recent years, AlphaZero and MuZero have achieved remarkable success in a broad range of applications. AlphaZero masters playing without human knowledge, while MuZero also learns the game rules and environment's dynamics without the access to a simulator during planning, which makes it applicable to complex environments. Both algorithms adopt Monte Carlo tree search (MCTS) during self-play, usually using hundreds of simulations for one move. For stochasticity, Stochastic MuZero was proposed to learn a stochastic model and uses the learned model to perform the tree search. Recently, Gumbel MuZero was proposed to ensure the policy improvement and can thus learn reliably with a small number of simulations. However, Gumbel MuZero used a deterministic model as in MuZero, limiting its performance in stochastic environments. In this paper, we propose to combine Gumbel MuZero and Stochastic MuZero, the first attempt to apply Gumbel MuZero to a stochastic environment. Our experiment on the stochastic puzzle game 2048 demonstrates that the combined algorithm can perform well and achieve an average score of 394,645 with only 3 simulations during training, greatly reducing the computational resource needed for training. |
---|---|
ISSN: | 2376-6824 |
DOI: | 10.1109/TAAI57707.2022.00017 |