Loading…
Two-stream graph convolutional neural network fusion for weakly supervised temporal action detection
Weakly supervised temporal action detection is an important and challenging task, which is to detect temporal intervals of actions and identify category with only video-level labels. Correctly identifying the transition state between action and background will improve the detection accuracy; therefo...
Saved in:
Published in: | Signal, image and video processing image and video processing, 2022-06, Vol.16 (4), p.947-954 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Weakly supervised temporal action detection is an important and challenging task, which is to detect temporal intervals of actions and identify category with only video-level labels. Correctly identifying the transition state between action and background will improve the detection accuracy; therefore, this paper focuses on filtering the transition state and proposes two-stream graph convolutional neural network fusion for weakly supervised temporal action detection. Generally, the transition state changes prominently and lasts for a short time, but it is not the same as the characteristics of the action. The feature difference between two video segments with temporal interactions indicates whether this segment belongs to the transition state. Then, according to the feature similarity and temporal correlation of the segments, the semantic similarity weighted graph and the transition-aware temporal correlation graph are constructed. Finally, the temporal attention sequence of video segments is extracted according to the fused two-stream graph feature. Taking the attention-based feature expression as input for the linear classifier to generate the class activation sequence, and the temporal action detection is performed accordingly. Experimental results on shared datasets show that the proposed method can effectively improve the performance of action detection. |
---|---|
ISSN: | 1863-1703 1863-1711 |
DOI: | 10.1007/s11760-021-02039-5 |