Loading…

VirPNet: A Multimodal Virtual Point Generation Network for 3D Object Detection

LiDAR and camera are the most common used sensors to percept the road scenes in autonomous driving. Current methods tried to fuse the two complementary information to boost 3D object detection. However, there are still two burning problems for multi-modality 3D object detection. One is the detection...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on multimedia 2024, Vol.26, p.10597-10609
Main Authors: Wang, Lin, Sun, Shiliang, Zhao, Jing
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:LiDAR and camera are the most common used sensors to percept the road scenes in autonomous driving. Current methods tried to fuse the two complementary information to boost 3D object detection. However, there are still two burning problems for multi-modality 3D object detection. One is the detection problem for the objects with sparse point clouds. The other is the misalignment of different sensors caused by the fixed physical locations. Therefore, this paper argues that explicitly fusing information from the two modalities with the physical misalignment is suboptimal for multi-modality 3D object detection. This paper presents a novel virtual point generation network, VirPNet, to overcome the multi-modality fusion challenges. On one hand, it completes sparse point cloud objects from image source and improves the final detection accuracy. On the other hand, it directly detects 3D targets from raw point clouds to avoid the physical misalignment between LiDAR and camera sensors. Different from previous point cloud completion methods, VirPNet fully utilizes the geometric information of pixels and point clouds and simplifies 3D point cloud regression into a 2D distance regression problem through a virtual plane. Experimental results on KITTI 3D object detection dataset and nuScenes dataset demonstrate that VirPNet improves the detection accuracy with the help of the generated virtual points.
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2024.3410117