Loading…

POS-BERT: Point cloud one-stage BERT pre-training

Recently, the pre-training paradigm combining Transformer and masked language modeling in BERT has achieved tremendous success not only in NLP, but also in images and point clouds. However, directly extending BERT from NLP to point clouds requires first training a discrete Variational AutoEncoder (d...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications 2024-04, Vol.240, p.122563, Article 122563
Main Authors: Fu, Kexue, Gao, Peng, Liu, Shaolei, Qu, Linhao, Gao, Longxiang, Wang, Manning
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Recently, the pre-training paradigm combining Transformer and masked language modeling in BERT has achieved tremendous success not only in NLP, but also in images and point clouds. However, directly extending BERT from NLP to point clouds requires first training a discrete Variational AutoEncoder (dVAE) as the tokenizer, which results in a complex two-stage process, as in Point-BERT. Inspired by BERT and MoCo, we propose POS-BERT, a one-stage BERT pre-training method for point clouds. Specifically, we use the masked patch modeling (MPM) task to perform point cloud pre-training, which aims to recover masked patch information under the supervision of a tokenizer’s output. Unlike Point-BERT, whose tokenizer is extra-trained and frozen, we propose a momentum tokenizer which is dynamically updated during training the Transformer. Furthermore, in order to better learn high-level semantic representation, we integrate contrastive learning into the proposed framework to maximize the class token consistency between augmented point cloud pairs. Experiments show that POS-BERT achieves the state-of-the-art performance on linear SVM classification of ModelNet40 with fixed feature extractors, and it exceeds Point-BERT by 3.5%. In addition, POS-BERT has significantly improved many downstream tasks, including fine-tuned classification, few-shot classification and part segmentation. The code and trained models will be released on https://github.com/fukexue/POS-BERT.git. •Proposed a Point cloud One-Stage BERT-style pre-training method.•Using a momentum tokenizer to provide continuous and dynamic supervision signals.•Does not require an extra training step.•Using a contrastive learning to learn better high-level semantic representation.•Achieved the best performance on multiple downstream tasks.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2023.122563