Loading…

Enhancing Multi-UAV Reconnaissance and Search Through Double Critic DDPG With Belief Probability Maps

Unmanned Aerial Vehicles (UAVs) have recently attracted significant attention due to their potential applications in reconnaissance and search. This paper aims to investigate the issue of multi-UAV cooperative reconnaissance and search (MCRS) to ensure ample coverage of the mission area and precise...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on intelligent vehicles 2024-02, Vol.9 (2), p.3827-3842
Main Authors:	Zhang, Boquan, Lin, Xiang, Zhu, Yifan, Tian, JING, Zhu, Zhi
Format:	Article
Language:	English
Subjects:	Autonomous aerial vehicles belief probability map bias and variance Deep learning double critic deep deterministic policy gradient Markov processes Multi-UAV Multiple objective analysis Optimization Reconnaissance Reconnaissance aircraft reconnaissance and search Reinforcement learning Search problems Searching Training Uncertainty Unmanned aerial vehicles
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Unmanned Aerial Vehicles (UAVs) have recently attracted significant attention due to their potential applications in reconnaissance and search. This paper aims to investigate the issue of multi-UAV cooperative reconnaissance and search (MCRS) to ensure ample coverage of the mission area and precise localization of static targets. The MCRS problem is modeled as a multi-objective optimization problem, taking into account the credibility of search results. To achieve this, we design a belief probability map based on the Dempster-Shafer (DS) evidence theory, comprising an uncertainty map and two target maps. This representation enables a clear depiction of both the presence of the target and the uncertainty within the map. Subsequently, we reformulate this multi-objective optimization problem within the framework of Decentralized Partially Observable Markov Decision Process (Dec-POMDP). To address this reformulation, a new deep reinforcement learning approach called Double Critic Deep Deterministic Policy Gradient (DCDDPG) is proposed. Specifically, we introduce both a centralized critic and a local critic for each UAV agent to estimate the action-value function. This approach helps balance the bias in the action-value function estimation and the variance in the policy updates, thereby improving the coordination effect. Extensive simulation results demonstrate that DCDDPG outperforms existing techniques in terms of search efficiency and coverage.
ISSN:	2379-8858 2379-8904
DOI:	10.1109/TIV.2024.3352581