Loading…

Target Detection for USVs by Radar-Vision Fusion With Swag-Robust Distance-Aware Probabilistic Multimodal Data Association

Unmanned surface vehicles (USVs) have been widely used for a wide range of tasks in the past decades. Accurate perception of the surrounding environment on the water surface under complex conditions is crucial for USVs to conduct effective operations. This article proposes a radar-vision fusion fram...

Full description

Saved in:
Bibliographic Details
Published in:IEEE sensors journal 2024-06, Vol.24 (12), p.20177-20187
Main Authors: Li, Zhenglin, Yuan, Tianxin, Ma, Liyan, Zhou, Yang, Peng, Yan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Unmanned surface vehicles (USVs) have been widely used for a wide range of tasks in the past decades. Accurate perception of the surrounding environment on the water surface under complex conditions is crucial for USVs to conduct effective operations. This article proposes a radar-vision fusion framework for USVs to accurately detect typical targets on the water surface. The modality difference between images and radar measurements, along with their perpendicular coordinates, presents challenges in the fusion process. The swaying of USVs on water and the extensive areas of perception enhance the difficulties of multisensor data association. To address these problems, we propose two modules to enhance multisensor fusion performance: a movement-compensated projection module and a distance-aware probabilistic data association module. The former effectively reduces projection bias during the alignment process of radar and camera signals by compensating for sensor movement using measured roll and pitch angles from the inertial measurement unit (IMU). The latter module models target regions guided by each radar measurement as a bivariate Gaussian distribution, with its covariance matrix adaptively derived based on the distance between targets and the camera. Consequently, the association of radar points and images is robust to projection errors and works well for multiscale objects. Features of radar points and images are subsequently extracted with two parallel backbones and fused at different levels to provide sufficient semantic information for robust object detection. The proposed framework achieves an average precision (AP) of 0.501 on the challenging real-world dataset established by us, outperforming state-of-the-art vision-only and radar-vision fusion methods.
ISSN:1530-437X
1558-1748
DOI:10.1109/JSEN.2024.3394703