Local Shuffled Skeleton Position Embedding Vision Transformer for human activity recognition

Vision Transformers (ViTs) in human activity recognition tasks suffer from inadequate spatial modeling through conventional position embeddings, leading to over-reliance on fixed positional information. This paper proposes Shuffled Positional Embedding (SPE), a mechanism that randomly disrupts the o...

Full description

Saved in:
Bibliographic Details
Main Authors: Zihui Yan, Xiyu Shi, Varuna De-Silva
Format: Default Conference proceeding
Published: 2025
Subjects:
Online Access:https://hdl.handle.net/2134/30601004.v1
Tags: Add Tag
No Tags, Be the first to tag this record!