Loading…

PillarNeSt: Embracing Backbone Scaling and Pretraining for Pillar-based 3D Object Detection

Pillar-based 3D object detectors mainly employ randomly initialized 2D convolution neural network (ConvNet) for feature extraction and fail to enjoy the benefits from the backbone scaling and pretraining in the image domain. This paper shows the effectiveness of 2D backbone scaling and pretraining f...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on intelligent vehicles 2024, p.1-10
Main Authors: Mao, Weixin, Wang, Tiancai, Zhang, Diankun, Yan, Junjie, Yoshie, Osamu
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Pillar-based 3D object detectors mainly employ randomly initialized 2D convolution neural network (ConvNet) for feature extraction and fail to enjoy the benefits from the backbone scaling and pretraining in the image domain. This paper shows the effectiveness of 2D backbone scaling and pretraining for pillar-based 3D object detectors. For better backbone scaling, we first introduce several design principles for point cloud backbone, to tackle the sparsity of point cloud and improve the effective receptive field. The backbone scaling is achieved by adaptively designed based on the model size. For backbone pretraining, we propose a weight adaptation module, to transfer the image knowledge obtained by pretraining on large-scale image datasets for the point cloud. Our proposed pillar-based detector, termed PillarNeSt, outperforms the existing 3D object detectors by a large margin on the nuScenes and Argoversev2 datasets. Code is released at https://github.com/WayneMao/PillarNeSt .
ISSN:2379-8858
2379-8904
DOI:10.1109/TIV.2024.3386576