Local Shuffled Skeleton Position Embedding Vision Transformer for human activity recognition
Vision Transformers (ViTs) in human activity recognition tasks suffer from inadequate spatial modeling through conventional position embeddings, leading to over-reliance on fixed positional information. This paper proposes Shuffled Positional Embedding (SPE), a mechanism that randomly disrupts the o...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Default Conference proceeding |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://hdl.handle.net/2134/30601004.v1 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|