Loading…
Convergence Rates of Asynchronous Policy Iteration for Zero-Sum Markov Games under Stochastic and Optimistic Settings
We consider an asynchronous policy iteration method for solving the problem of sequential zero-sum Markov games. This method can be viewed as a recursive iteration for finding the fixed point of a contractive Bellman operator. Our focus in this paper is to derive the convergence rates of this method...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | We consider an asynchronous policy iteration method for solving the problem of sequential zero-sum Markov games. This method can be viewed as a recursive iteration for finding the fixed point of a contractive Bellman operator. Our focus in this paper is to derive the convergence rates of this method under two scenarios, namely, stochastic and optimistic settings. In the first scenario, we assume that we only have access to an unbiased estimate of the underlying Bellman operator, resulting in a stochastic variant of the policy iteration method. In the second, we consider the popular optimistic setting of the policy iteration method, where the policy improvement steps are approximately estimated. In both scenarios, we show that the asynchronous policy iteration method converges with a sublinear rate \mathcal{O}(1/k), where k is the number of iterations. We also provide numerical simulations to illustrate our theoretical results. |
---|---|
ISSN: | 2576-2370 |
DOI: | 10.1109/CDC51059.2022.9993200 |