Loading…
POS-BERT: Point cloud one-stage BERT pre-training
Recently, the pre-training paradigm combining Transformer and masked language modeling in BERT has achieved tremendous success not only in NLP, but also in images and point clouds. However, directly extending BERT from NLP to point clouds requires first training a discrete Variational AutoEncoder (d...
Saved in:
Published in: | Expert systems with applications 2024-04, Vol.240, p.122563, Article 122563 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Recently, the pre-training paradigm combining Transformer and masked language modeling in BERT has achieved tremendous success not only in NLP, but also in images and point clouds. However, directly extending BERT from NLP to point clouds requires first training a discrete Variational AutoEncoder (dVAE) as the tokenizer, which results in a complex two-stage process, as in Point-BERT. Inspired by BERT and MoCo, we propose POS-BERT, a one-stage BERT pre-training method for point clouds. Specifically, we use the masked patch modeling (MPM) task to perform point cloud pre-training, which aims to recover masked patch information under the supervision of a tokenizer’s output. Unlike Point-BERT, whose tokenizer is extra-trained and frozen, we propose a momentum tokenizer which is dynamically updated during training the Transformer. Furthermore, in order to better learn high-level semantic representation, we integrate contrastive learning into the proposed framework to maximize the class token consistency between augmented point cloud pairs. Experiments show that POS-BERT achieves the state-of-the-art performance on linear SVM classification of ModelNet40 with fixed feature extractors, and it exceeds Point-BERT by 3.5%. In addition, POS-BERT has significantly improved many downstream tasks, including fine-tuned classification, few-shot classification and part segmentation. The code and trained models will be released on https://github.com/fukexue/POS-BERT.git.
•Proposed a Point cloud One-Stage BERT-style pre-training method.•Using a momentum tokenizer to provide continuous and dynamic supervision signals.•Does not require an extra training step.•Using a contrastive learning to learn better high-level semantic representation.•Achieved the best performance on multiple downstream tasks. |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2023.122563 |