Loading…

Two-stream graph convolutional neural network fusion for weakly supervised temporal action detection

Weakly supervised temporal action detection is an important and challenging task, which is to detect temporal intervals of actions and identify category with only video-level labels. Correctly identifying the transition state between action and background will improve the detection accuracy; therefo...

Full description

Saved in:

Bibliographic Details
Published in:	Signal, image and video processing image and video processing, 2022-06, Vol.16 (4), p.947-954
Main Authors:	Zhao, Mengyao, Hu, Zhengping, Li, Shufang, Bi, Shuai, Sun, Zhe
Format:	Article
Language:	English
Subjects:	Artificial neural networks Computer Imaging Computer Science Feature extraction Image Processing and Computer Vision Multimedia Information Systems Neural networks Original Paper Pattern Recognition and Graphics Segments Signal,Image and Speech Processing Similarity Vision
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Weakly supervised temporal action detection is an important and challenging task, which is to detect temporal intervals of actions and identify category with only video-level labels. Correctly identifying the transition state between action and background will improve the detection accuracy; therefore, this paper focuses on filtering the transition state and proposes two-stream graph convolutional neural network fusion for weakly supervised temporal action detection. Generally, the transition state changes prominently and lasts for a short time, but it is not the same as the characteristics of the action. The feature difference between two video segments with temporal interactions indicates whether this segment belongs to the transition state. Then, according to the feature similarity and temporal correlation of the segments, the semantic similarity weighted graph and the transition-aware temporal correlation graph are constructed. Finally, the temporal attention sequence of video segments is extracted according to the fused two-stream graph feature. Taking the attention-based feature expression as input for the linear classifier to generate the class activation sequence, and the temporal action detection is performed accordingly. Experimental results on shared datasets show that the proposed method can effectively improve the performance of action detection.
ISSN:	1863-1703 1863-1711
DOI:	10.1007/s11760-021-02039-5