Loading…

VoPiFNet: Voxel-Pixel Fusion Network for Multi-Class 3D Object Detection

Many LiDAR-based methods for detecting large objects, single-class object detection, or under easy situations were claimed to perform well. However, due to their failure to exploit image semantics, their performance in detecting small targets or under challenging conditions does not exceed that of f...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on intelligent transportation systems 2024-08, Vol.25 (8), p.8527-8537
Main Authors:	Wang, Chia-Hung, Chen, Hsueh-Wei, Chen, Yi, Hsiao, Pei-Yung, Fu, Li-Chen
Format:	Article
Language:	English
Subjects:	3D object detection Cameras cross-modal attention deep learning Detectors Feature extraction Laser radar multi-class Multi-modal Object detection Point cloud compression Three-dimensional displays
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Many LiDAR-based methods for detecting large objects, single-class object detection, or under easy situations were claimed to perform well. However, due to their failure to exploit image semantics, their performance in detecting small targets or under challenging conditions does not exceed that of fusion-based approaches. In order to elevate the detection performance in a complex environment, this paper proposes a multi-modal and multi-class 3D object detection network, named Voxel-Pixel Fusion Network (VoPiFNet). Within this network, we design a key novel component called the Voxel-Pixel Fusion Layer, which takes advantage of the geometric relation of a voxel-pixel pair and effectively fuses voxel features and pixel features with the cross-modal attention mechanism. Moreover, after considering the characteristics of the voxel-pixel pair, we design four parameters to guide and enhance this fusion effect. This proposed layer can be integrated with voxel-based 3D LiDAR detectors and 2D image detectors. Finally, the proposed method is evaluated on the public KITTI benchmark dataset for multi-class 3D object detection at different levels. Extensive experiments show that our method outperforms the state-of-the-art methods in detecting challenging pedestrian category and achieve promising performance in overall 3D mean average precision (mAP).
ISSN:	1524-9050 1558-0016
DOI:	10.1109/TITS.2024.3392783