Loading…

Adaptive policy for two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs

A two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs is considered. The control objective of participants is the optimization of the limiting average payoff. The behaviour of each players is modelled by a finite controlled Markov chain. A novel ada...

Full description

Saved in:
Bibliographic Details
Published in:Automatica (Oxford) 2001-07, Vol.37 (7), p.1007-1018
Main Authors: Najim, K., Poznyak, A.S., Gomez, E.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs is considered. The control objective of participants is the optimization of the limiting average payoff. The behaviour of each players is modelled by a finite controlled Markov chain. A novel adaptive policy based of Lagrange multipliers is developed. We introduce a regularized Lagrange function to guarantee the uniqueness of the corresponding saddle-point (equilibrium point) and a new normalization procedure participating in the adaptive strategy which asymptotically realizes this equilibrium. The saddle-point is shown to be unique. The convergence properties are stated and it is shown that this adaptive control algorithm has the order of convergence of magnitude ( n −1/3).
ISSN:0005-1098
1873-2836
DOI:10.1016/S0005-1098(01)00050-4