Loading…
Adaptive policy for two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs
A two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs is considered. The control objective of participants is the optimization of the limiting average payoff. The behaviour of each players is modelled by a finite controlled Markov chain. A novel ada...
Saved in:
Published in: | Automatica (Oxford) 2001-07, Vol.37 (7), p.1007-1018 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | A two finite Markov chains zero-sum stochastic game with unknown transition matrices and average payoffs is considered. The control objective of participants is the optimization of the limiting average payoff. The behaviour of each players is modelled by a finite controlled Markov chain. A novel adaptive policy based of Lagrange multipliers is developed. We introduce a regularized Lagrange function to guarantee the uniqueness of the corresponding saddle-point (equilibrium point) and a new normalization procedure participating in the adaptive strategy which asymptotically realizes this equilibrium. The saddle-point is shown to be unique. The convergence properties are stated and it is shown that this adaptive control algorithm has the order of convergence of magnitude (
n
−1/3). |
---|---|
ISSN: | 0005-1098 1873-2836 |
DOI: | 10.1016/S0005-1098(01)00050-4 |