Loading…

Dyna-H: A heuristic planning reinforcement learning algorithm applied to role-playing game strategy decision systems

In a role-playing game, finding optimal trajectories is one of the most important tasks. In fact, the strategy decision system becomes a key component of a game engine. Determining the way in which decisions are taken (e.g. online, batch or simulated) and the consumed resources in decision making (e...

Full description

Saved in:

Bibliographic Details
Published in:	Knowledge-based systems 2012-08, Vol.32, p.28-36
Main Authors:	Santos, Matilde, Martín H., José Antonio, López, Victoria, Botella, Guillermo
Format:	Article
Language:	English
Subjects:	A-star Algorithms Decision making Games Heuristic-search Heuristics Path-finding Reinforcement-learning
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In a role-playing game, finding optimal trajectories is one of the most important tasks. In fact, the strategy decision system becomes a key component of a game engine. Determining the way in which decisions are taken (e.g. online, batch or simulated) and the consumed resources in decision making (e.g. execution time, memory) will influence, to a major degree, the game performance. When classical search algorithms such as A∗ can be used, they are the very first option. Nevertheless, such methods rely on precise and complete models of the search space so there are many interesting scenarios where its application is not possible, and hence, model free methods for sequential decision making under uncertainty are the best choice. In this paper, we propose a heuristic planning strategy to incorporate, into a Dyna agent, the ability of heuristic-search in path-finding. The proposed Dyna-H algorithm selects branches more likely to produce outcomes than other branches, just as A∗ does. However, unlike A∗, it has the advantages of a model-free online reinforcement learning algorithm. We evaluate our proposed algorithm against the one-step Q-learning and Dyna-Q algorithms and found that the Dyna-H, with its advantages, produced clearly superior results.
ISSN:	0950-7051 1872-7409
DOI:	10.1016/j.knosys.2011.09.008