Loading…

A Generalized Minimax Q-Learning Algorithm for Two-Player Zero-Sum Stochastic Games

We consider the problem of two-player zero-sum games. This problem is formulated as a min-max Markov game in this article. The solution of this game, which is the min-max payoff, starting from a given state is called the min-max value of the state. In this article, we compute the solution of the two...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on automatic control 2022-09, Vol.67 (9), p.4816-4823
Main Authors:	Diddigi, Raghuram Bharadwaj, Kamanchi, Chandramouli, Bhatnagar, Shalabh
Format:	Article
Language:	English
Subjects:	Algorithms Computational modeling Convergence Game theory Games Iterative algorithms Iterative methods Machine learning Markov processes Minimax Q-learning Minimax technique Q-learning successive relaxation two-player zero-sum games Zero sum games
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We consider the problem of two-player zero-sum games. This problem is formulated as a min-max Markov game in this article. The solution of this game, which is the min-max payoff, starting from a given state is called the min-max value of the state. In this article, we compute the solution of the two-player zero-sum game, utilizing the technique of successive relaxation that has been successfully applied in this article to compute a faster value iteration algorithm in the context of Markov decision processes. We extend the concept of successive relaxation to the setting of two-player zero-sum games. We show that, under a special structure on the game, this technique facilitates faster computation of the min-max value of the states. We then derive a generalized minimax Q-learning algorithm, which computes the optimal policy when the model information is not known. Finally, we prove the convergence of the proposed generalized minimax Q-learning algorithm utilizing stochastic approximation techniques, under an assumption on the boundedness of iterates. Through experiments, we demonstrate the
ISSN:	0018-9286 1558-2523
DOI:	10.1109/TAC.2022.3159453