Loading…
Action Recognition with Temporal Scale-Invariant Deep Learning Framework
Recognizing actions according to video features is an important problem in a wide scope of applications. In this paper, we propose a temporal scale-invariant deep learning framework for action recognition, which is robust to the change of action speed. Specifically, a video is firstly split into sev...
Saved in:
Published in: | China communications 2017-02, Vol.14 (2), p.163-172 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Recognizing actions according to video features is an important problem in a wide scope of applications. In this paper, we propose a temporal scale-invariant deep learning framework for action recognition, which is robust to the change of action speed. Specifically, a video is firstly split into several sub-action clips and a keyframe is selected from each sub-action clip. The spatial and motion features of the keyframe are extracted separately by two Convolutional Neural Networks(CNN) and combined in the convolutional fusion layer for learning the relationship between the features. Then, Long Short Term Memory(LSTM) networks are applied to the fused features to formulate long-term temporal clues. Finally, the action prediction scores of the LSTM network are combined by linear weighted summation. Extensive experiments are conducted on two popular and challenging benchmarks, namely, the UCF-101 and the HMDB51 Human Actions. On both benchmarks, our framework achieves superior results over the state-of-the-art methods by 93.7% on UCF-101 and 69.5% on HMDB51, respectively. |
---|---|
ISSN: | 1673-5447 |
DOI: | 10.1109/CC.2017.7868164 |