Loading…

Action Recognition with Temporal Scale-Invariant Deep Learning Framework

Recognizing actions according to video features is an important problem in a wide scope of applications. In this paper, we propose a temporal scale-invariant deep learning framework for action recognition, which is robust to the change of action speed. Specifically, a video is firstly split into sev...

Full description

Saved in:
Bibliographic Details
Published in:China communications 2017-02, Vol.14 (2), p.163-172
Main Authors: Chen, Huafeng, Chen, Jun, Hu, Ruimin, Chen, Chen, Wang, Zhongyuan
Format: Article
Language:English
Subjects:
Citations: Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Recognizing actions according to video features is an important problem in a wide scope of applications. In this paper, we propose a temporal scale-invariant deep learning framework for action recognition, which is robust to the change of action speed. Specifically, a video is firstly split into several sub-action clips and a keyframe is selected from each sub-action clip. The spatial and motion features of the keyframe are extracted separately by two Convolutional Neural Networks(CNN) and combined in the convolutional fusion layer for learning the relationship between the features. Then, Long Short Term Memory(LSTM) networks are applied to the fused features to formulate long-term temporal clues. Finally, the action prediction scores of the LSTM network are combined by linear weighted summation. Extensive experiments are conducted on two popular and challenging benchmarks, namely, the UCF-101 and the HMDB51 Human Actions. On both benchmarks, our framework achieves superior results over the state-of-the-art methods by 93.7% on UCF-101 and 69.5% on HMDB51, respectively.
ISSN:1673-5447
DOI:10.1109/CC.2017.7868164