Loading…
PillarNeSt: Embracing Backbone Scaling and Pretraining for Pillar-based 3D Object Detection
Pillar-based 3D object detectors mainly employ randomly initialized 2D convolution neural network (ConvNet) for feature extraction and fail to enjoy the benefits from the backbone scaling and pretraining in the image domain. This paper shows the effectiveness of 2D backbone scaling and pretraining f...
Saved in:
Published in: | IEEE transactions on intelligent vehicles 2024, p.1-10 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Pillar-based 3D object detectors mainly employ randomly initialized 2D convolution neural network (ConvNet) for feature extraction and fail to enjoy the benefits from the backbone scaling and pretraining in the image domain. This paper shows the effectiveness of 2D backbone scaling and pretraining for pillar-based 3D object detectors. For better backbone scaling, we first introduce several design principles for point cloud backbone, to tackle the sparsity of point cloud and improve the effective receptive field. The backbone scaling is achieved by adaptively designed based on the model size. For backbone pretraining, we propose a weight adaptation module, to transfer the image knowledge obtained by pretraining on large-scale image datasets for the point cloud. Our proposed pillar-based detector, termed PillarNeSt, outperforms the existing 3D object detectors by a large margin on the nuScenes and Argoversev2 datasets. Code is released at https://github.com/WayneMao/PillarNeSt . |
---|---|
ISSN: | 2379-8858 2379-8904 |
DOI: | 10.1109/TIV.2024.3386576 |