Loading…

Reciprocal Velocity Obstacle Spatial-Temporal Network for Distributed Multirobot Navigation

The core of multirobot collision avoidance lies in developing a decentralized policy that can guide robots from their initial positions to target locations based on the environment states perceived by the robots and ensure collision avoidance. However, the current multirobot collision avoidance poli...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on industrial electronics (1982) 2024-11, Vol.71 (11), p.14470-14480
Main Authors: Chen, Lin, Wang, Yaonan, Miao, Zhiqiang, Feng, Mingtao, Zhou, Zhen, Wang, Hesheng, Wang, Danwei
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The core of multirobot collision avoidance lies in developing a decentralized policy that can guide robots from their initial positions to target locations based on the environment states perceived by the robots and ensure collision avoidance. However, the current multirobot collision avoidance policy network is challenging to simultaneously extract the global spatial state, temporal state, and reciprocity among robots, which limits its performance. In this work, we have developed a novel reciprocal velocity obstacle (RVO) spatial-temporal network and employed the proximal policy optimization algorithm to train the network parameters during interactions with a multirobot simulation environment. Specifically, a temporal state encoder module, utilized to represent the temporal characteristics of observation sequence data, is designed and achieved through the combination of the graph attention mechanism and the transformer encoding module. Furthermore, we design a reciprocal spatial state encoder module achieved through the use of a transformer encoding module to merge feature data from long short-term memory (LSTM), GRU, and bidirectional gated recurrent units (BiGRUs) branches, serving the purpose of representing spatial characteristics in RVO sequence data. Extensive simulation experiments demonstrate that our proposed method outperforms the state-of-the-art distributed policy reinforcement learning (RL)-RVO. We further conducted physical experiments using three Crazyflie quadcopter drones, illustrating its ability to effectively guide agents' movements and avoid collisions.
ISSN:0278-0046
1557-9948
DOI:10.1109/TIE.2024.3379630