Loading…

Multiagent Path Finding Using Deep Reinforcement Learning Coupled With Hot Supervision Contrastive Loss

Multiagent path finding (MAPF) is employed to find collision-free paths to guide agents traveling from an initial to a target position. The advanced decentralized approach utilizes communication between agents to improve their performance in environments with high-density obstacles. However, it dram...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on industrial electronics (1982) 2023-07, Vol.70 (7), p.7032-7040
Main Authors: Chen, Lin, Wang, Yaonan, Mo, Yang, Miao, Zhiqiang, Wang, Hesheng, Feng, Mingtao, Wang, Sifei
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Multiagent path finding (MAPF) is employed to find collision-free paths to guide agents traveling from an initial to a target position. The advanced decentralized approach utilizes communication between agents to improve their performance in environments with high-density obstacles. However, it dramatically reduces the robustness of multiagent systems. To overcome this difficulty, we propose a novel method for solving MAPF problems. In this method, expert data are transformed into supervised signals by proposing a hot supervised contrastive loss, which is combined with reinforcement learning to teach fully-decentralized policies. Agents reactively plan paths online in a partially observable world while exhibiting implicit coordination without communication with others. We introduce the self-attention mechanism in the policy network, which improves the ability of the policy network to extract collaborative information between agents from the observation data. By designing simulation experiments, we demonstrate that the learned policy achieved good performance without communication between agents. Furthermore, real-world application experiments demonstrate the effectiveness of our method in practical applications.
ISSN:0278-0046
1557-9948
DOI:10.1109/TIE.2022.3206745