Loading…
Visual-semantic Alignment Temporal Parsing for Action Quality Assessment
Action Quality Assessment (AQA) is a challenging task involving analyzing fine-grained technical subactions, aligning high-level visual-semantic representations, and exploring internal temporal structures that capture the overall meaning of given action sequences. To address these challenges, we pro...
Saved in:
Published in: | IEEE transactions on circuits and systems for video technology 2024-10, p.1-1 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Action Quality Assessment (AQA) is a challenging task involving analyzing fine-grained technical subactions, aligning high-level visual-semantic representations, and exploring internal temporal structures that capture the overall meaning of given action sequences. To address these challenges, we propose a Visual-semantic Alignment Temporal Parsing Network (VATP-Net) to understand the high-level visual semantics of subaction sequences and internal temporal structures without explicit supervision for action quality assessment. The proposed approach designs a self-supervised temporal parsing module to generate subaction sequences from the given video by aligning the visual and semantic action features. It captures high-level semantics and the internal temporal dynamics of subaction sequences. Furthermore, a multimodal interaction module is proposed to capture the interaction between different modalities of action features, enabling a comprehensive assessment of fine-grained and scene-invariant action details. The proposed module captures the intricate relationships and encourages interactions between different modalities within an action sequence, enhancing the overall understanding of action assessment. We exhaustively evaluate our proposed approach on the MTL-AQA, Rhythmic Gymnastics (RG), FineFS, and Fis-V datasets. Extensive experimental results demonstrate the effectiveness and feasibility of our proposed approach, which outperforms state-of-the-art methods by a significant margin. |
---|---|
ISSN: | 1051-8215 1558-2205 |
DOI: | 10.1109/TCSVT.2024.3487242 |