Loading…

Convergence Rates of Asynchronous Policy Iteration for Zero-Sum Markov Games under Stochastic and Optimistic Settings

We consider an asynchronous policy iteration method for solving the problem of sequential zero-sum Markov games. This method can be viewed as a recursive iteration for finding the fixed point of a contractive Bellman operator. Our focus in this paper is to derive the convergence rates of this method...

Full description

Saved in:

Bibliographic Details
Main Authors:	Brahma, Sarnaduti, Bai, Yitao, Do, Duy Anh, Doan, Thinh T.
Format:	Conference Proceeding
Language:	English
Subjects:	Fluctuations Games Markov processes Nash equilibrium Numerical simulation Prediction algorithms Q-learning
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We consider an asynchronous policy iteration method for solving the problem of sequential zero-sum Markov games. This method can be viewed as a recursive iteration for finding the fixed point of a contractive Bellman operator. Our focus in this paper is to derive the convergence rates of this method under two scenarios, namely, stochastic and optimistic settings. In the first scenario, we assume that we only have access to an unbiased estimate of the underlying Bellman operator, resulting in a stochastic variant of the policy iteration method. In the second, we consider the popular optimistic setting of the policy iteration method, where the policy improvement steps are approximately estimated. In both scenarios, we show that the asynchronous policy iteration method converges with a sublinear rate \mathcal{O}(1/k), where k is the number of iterations. We also provide numerical simulations to illustrate our theoretical results.
ISSN:	2576-2370
DOI:	10.1109/CDC51059.2022.9993200