Loading…

stagNet: An Attentive Semantic RNN for Group Activity and Individual Action Recognition

In real life, group activity recognition plays a significant and fundamental role in a variety of applications, e.g. sports video analysis, abnormal behavior detection, and intelligent surveillance. In a complex dynamic scene, a crucial yet challenging issue is how to better model the spatio-tempora...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on circuits and systems for video technology 2020-02, Vol.30 (2), p.549-565
Main Authors: Qi, Mengshi, Wang, Yunhong, Qin, Jie, Li, Annan, Luo, Jiebo, Van Gool, Luc
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In real life, group activity recognition plays a significant and fundamental role in a variety of applications, e.g. sports video analysis, abnormal behavior detection, and intelligent surveillance. In a complex dynamic scene, a crucial yet challenging issue is how to better model the spatio-temporal contextual information and inter-person relationship. In this paper, we present a novel attentive semantic recurrent neural network (RNN), namely, stagNet, for understanding group activities and individual actions in videos, by combining the spatio-temporal attention mechanism and semantic graph modeling. Specifically, a structured semantic graph is explicitly modeled to express the spatial contextual content of the whole scene, which is further incorporated with the temporal factor through structural-RNN. By virtue of the "factor sharing" and "message passing" mechanisms, our stagNet is capable of extracting discriminative and informative spatio-temporal representations and capturing inter-person relationships. Moreover, we adopt a spatio-temporal attention model to focus on key persons/frames for improved recognition performance. Besides, a body-region attention and a global-part feature pooling strategy are devised for individual action recognition. In experiments, four widely-used public datasets are adopted for performance evaluation, and the extensive results demonstrate the superiority and effectiveness of our method.
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2019.2894161