Loading…

Data-Based Optimal Consensus Control for Multiagent Systems With Policy Gradient Reinforcement Learning

This article investigates the optimally distributed consensus control problem for discrete-time multiagent systems with completely unknown dynamics and computational ability differences. The problem can be viewed as solving nonzero-sum games with distributed reinforcement learning (RL), and each age...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transaction on neural networks and learning systems 2022-08, Vol.33 (8), p.3872-3883
Main Authors:	Yang, Xindi, Zhang, Hao, Wang, Zhuping
Format:	Article
Language:	English
Subjects:	Algorithms Asynchronous learning Computer applications Consensus control Control algorithms Control systems Control theory Convergence data-based control Discrete time systems Dynamic programming Functional analysis Games Heuristic algorithms Learning Machine learning Multi-agent systems Multiagent systems Neural networks nonzero-sum games optimal distributed consensus control Optimization policy gradient (PG) reinforcement learning (RL) Real time Reinforcement Reinforcement learning Stability analysis Synchronization
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This article investigates the optimally distributed consensus control problem for discrete-time multiagent systems with completely unknown dynamics and computational ability differences. The problem can be viewed as solving nonzero-sum games with distributed reinforcement learning (RL), and each agent is a player in these games. First, to guarantee the real-time performance of learning algorithms, a data-based distributed control algorithm is proposed for multiagent systems using offline system interaction data sets. By utilizing the interactive data produced during the run of a real-time system, the proposed algorithm improves system performance based on distributed policy gradient RL. The convergence and stability are guaranteed based on functional analysis and the Lyapunov method. Second, to address asynchronous learning caused by computational ability differences in multiagent systems, the proposed algorithm is extended to an asynchronous version in which executing policy improvement or not of each agent is independent of its neighbors. Furthermore, an actor-critic structure, which contains two neural networks, is developed to implement the proposed algorithm in synchronous and asynchronous cases. Based on the method of weighted residuals, the convergence and optimality of the neural networks are guaranteed by proving the approximation errors converge to zero. Finally, simulations are conducted to show the effectiveness of the proposed algorithm.
ISSN:	2162-237X 2162-2388
DOI:	10.1109/TNNLS.2021.3054685