Loading…
VoPiFNet: Voxel-Pixel Fusion Network for Multi-Class 3D Object Detection
Many LiDAR-based methods for detecting large objects, single-class object detection, or under easy situations were claimed to perform well. However, due to their failure to exploit image semantics, their performance in detecting small targets or under challenging conditions does not exceed that of f...
Saved in:
Published in: | IEEE transactions on intelligent transportation systems 2024-08, Vol.25 (8), p.8527-8537 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Many LiDAR-based methods for detecting large objects, single-class object detection, or under easy situations were claimed to perform well. However, due to their failure to exploit image semantics, their performance in detecting small targets or under challenging conditions does not exceed that of fusion-based approaches. In order to elevate the detection performance in a complex environment, this paper proposes a multi-modal and multi-class 3D object detection network, named Voxel-Pixel Fusion Network (VoPiFNet). Within this network, we design a key novel component called the Voxel-Pixel Fusion Layer, which takes advantage of the geometric relation of a voxel-pixel pair and effectively fuses voxel features and pixel features with the cross-modal attention mechanism. Moreover, after considering the characteristics of the voxel-pixel pair, we design four parameters to guide and enhance this fusion effect. This proposed layer can be integrated with voxel-based 3D LiDAR detectors and 2D image detectors. Finally, the proposed method is evaluated on the public KITTI benchmark dataset for multi-class 3D object detection at different levels. Extensive experiments show that our method outperforms the state-of-the-art methods in detecting challenging pedestrian category and achieve promising performance in overall 3D mean average precision (mAP). |
---|---|
ISSN: | 1524-9050 1558-0016 |
DOI: | 10.1109/TITS.2024.3392783 |