Loading…

Multiagent Path Finding Using Deep Reinforcement Learning Coupled With Hot Supervision Contrastive Loss

Multiagent path finding (MAPF) is employed to find collision-free paths to guide agents traveling from an initial to a target position. The advanced decentralized approach utilizes communication between agents to improve their performance in environments with high-density obstacles. However, it dram...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on industrial electronics (1982) 2023-07, Vol.70 (7), p.7032-7040
Main Authors:	Chen, Lin, Wang, Yaonan, Mo, Yang, Miao, Zhiqiang, Wang, Hesheng, Feng, Mingtao, Wang, Sifei
Format:	Article
Language:	English
Subjects:	Collision avoidance Communication Convolution Data mining Deep learning Feature extraction Imitation learning (IL) Machine learning Multi-agent systems multiagent path finding (MAPF) Multiagent systems Reinforcement learning supervision contrastive learning Task analysis Training
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Multiagent path finding (MAPF) is employed to find collision-free paths to guide agents traveling from an initial to a target position. The advanced decentralized approach utilizes communication between agents to improve their performance in environments with high-density obstacles. However, it dramatically reduces the robustness of multiagent systems. To overcome this difficulty, we propose a novel method for solving MAPF problems. In this method, expert data are transformed into supervised signals by proposing a hot supervised contrastive loss, which is combined with reinforcement learning to teach fully-decentralized policies. Agents reactively plan paths online in a partially observable world while exhibiting implicit coordination without communication with others. We introduce the self-attention mechanism in the policy network, which improves the ability of the policy network to extract collaborative information between agents from the observation data. By designing simulation experiments, we demonstrate that the learned policy achieved good performance without communication between agents. Furthermore, real-world application experiments demonstrate the effectiveness of our method in practical applications.
ISSN:	0278-0046 1557-9948
DOI:	10.1109/TIE.2022.3206745