Loading…

Off-Policy Model-Free Learning for Multi-Player Non-Zero-Sum Games With Constrained Inputs

In this paper, multi-player non-zero-sum games with control constraints are studied by utilizing a novel model-free approach based on adaptive dynamic programming framework. First, the model-based policy iteration (PI) method is provided, which requires the system dynamics, and the convergence is de...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on circuits and systems. I, Regular papers Regular papers, 2023-02, Vol.70 (2), p.910-920
Main Authors:	Huo, Yu, Wang, Ding, Qiao, Junfei, Li, Menghua
Format:	Article
Language:	English
Subjects:	Adaptation models Adaptive dynamic programming Algorithms approximate dynamic programming Constraints continuous-time nonlinear systems Convergence Cost function Data collection Dynamic programming Games input constraints integral reinforcement learning Iterative methods Mathematical models Nash equilibrium non-zero-sum games off-policy Optimal control System dynamics Zero sum games
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In this paper, multi-player non-zero-sum games with control constraints are studied by utilizing a novel model-free approach based on adaptive dynamic programming framework. First, the model-based policy iteration (PI) method is provided, which requires the system dynamics, and the convergence is demonstrated. Then, aiming to eliminate the need for the system dynamics, a model-free iterative method is obtained by using the off-policy integral reinforcement learning (IRL) scheme based on the PI approach. Moreover, the system data is collected in order to construct the model-free approach. Besides, we analyze the convergence of the off-policy IRL approach by proving the equivalence between the model-free iterative approach and the model-based iterative approach. Remarkably, in the implementation of the scheme, the control policy and cost function are approximated by utilizing the actor-critic networks. The least square algorithm is utilized to learn the actor-critic networks weights depended on the collected data sets. Finally, two cases are provided to demonstrate the effectiveness of the established framework.
ISSN:	1549-8328 1558-0806
DOI:	10.1109/TCSI.2022.3221274