Loading…

MVMM: Multi-View Multi-Modal 3D Object Detection for Autonomous Driving

Object detection in 3D space is a fundamental technology in the autonomous driving system. Among the published 3D object detection methods, the single-modal methods based on point clouds have been widely studied. One problem exposed by these methods is that point clouds lack color and texture featur...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on industrial informatics 2023-04, p.1-9
Main Authors: Li, Shangjie, Geng, Keke, Yin, Guodong, Wang, Ziwei, Qian, Min
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Object detection in 3D space is a fundamental technology in the autonomous driving system. Among the published 3D object detection methods, the single-modal methods based on point clouds have been widely studied. One problem exposed by these methods is that point clouds lack color and texture features. The limitation in conveying semantic information often leads to failures in detection. In contrast, the multi-modal methods based on the image and point clouds fusion may solve this problem, but relevant research is not sufficient. In this work, a single-stage multi-view multi-modal 3D object detector (MVMM) is proposed, which can naturally and efficiently extract semantic and geometric information from the image and point clouds. Specifically, the data-level fusion approach of point clouds coloring is used for combining information from the camera and LIDAR. Next, an encoder-decoder backbone is devised to extract features from colored points in the range view. Then, colored points are concatenated with the range view features, voxelized, and fed into the point view bridge for down-sampling. Finally, the down-sampled feature map is used by the bird's eye view backbone and the detection head for generating 3D results based on predefined anchors. According to extensive experiments on the KITTI dataset, MVMM achieves competitive performance while runs at 27 FPS on the 1080 Ti GPU. Particularly, MVMM performs extremely well in difficult scenes (e.g., heavy occlusion and truncation) due to the understanding of fused information.
ISSN:1551-3203
DOI:10.1109/TII.2023.3263274