Loading…

A Novel SO(3) Rotational Equivariant Masked Autoencoder for 3D Mesh Object Analysis

Equivariant networks have recently made significant strides in computer vision tasks related to robotic grasping, molecule generation, and 6D pose tracking. In this paper, we explore 3D mesh object analysis based on an equivariant masked autoencoder to reduce the model dependence on large datasets a...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on circuits and systems for video technology 2024-09, p.1-1
Main Authors:	Xie, Min, Zhao, Jieyu, Shen, Kedi
Format:	Article
Language:	English
Subjects:	3D mesh masked autoencoder Convolution Kernel Neurons Point cloud compression pose transformation estimation rotation-equivariance sparse token Three-dimensional displays Transformers Vectors
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Equivariant networks have recently made significant strides in computer vision tasks related to robotic grasping, molecule generation, and 6D pose tracking. In this paper, we explore 3D mesh object analysis based on an equivariant masked autoencoder to reduce the model dependence on large datasets and predict the pose transformation. We employ 3D reconstruction tasks under rotation and masking operations, such as segmentation tasks after rotation, as pretraining to enhance downstream task performance. To mitigate the computational complexity of the algorithm, we first utilize multiple non-overlapping 3D mesh patches with a fixed face size. We then design a rotation-equivariant self-attention mechanism to obtain advanced features. To improve the throughput of the encoder, we design a sparse token merging strategy. Our method achieves comparable performance on equivariant analysis tasks of mesh objects, such as 3D mesh pose transformation estimation, object classification and part segmentation on the ShapeNetCore16, Manifold40, COSEG-aliens, COSEG-vases and Human Body datasets. In the object classification task, we achieve superior performance even when only 10% of the original sample is used. We perform extensive ablation experiments to demonstrate the efficacy of critical design choices in our approach.
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2024.3465041