Loading…

Human action recognition using short-time motion energy template images and PCANet features

Human action recognition has received significant attention because of its wide applications in human–machine interaction, visual surveillance, and video indexing. Recent progress in deep learning algorithms and convolutional neural networks (CNNs) significantly affects the performance of many actio...

Full description

Saved in:
Bibliographic Details
Published in:Neural computing & applications 2020-08, Vol.32 (16), p.12561-12574
Main Authors: Abdelbaky, Amany, Aly, Saleh
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Human action recognition has received significant attention because of its wide applications in human–machine interaction, visual surveillance, and video indexing. Recent progress in deep learning algorithms and convolutional neural networks (CNNs) significantly affects the performance of many action recognition systems. However, CNNs are designed to learn features from 2D spatial images, while action videos are considered as 3D spatiotemporal signals. In addition, the complex structure of most deep networks and their dependency on backpropagation learning algorithm have a negative impact on the efficiency of real-time human action recognition systems. To avoid these limitations, we propose a new human action recognition method based on principal component analysis network (PCANet) which is a simple architecture of CNN exploiting unsupervised learning instead of the commonly used supervised algorithms. However, PCANet is originally designed to solve 2D image classification problems. In order to make it suitable for solving action recognition problem in videos, the temporal information of the input video is appropriately represented using motion energy templates. Multiple short-time motion energy image (ST-MEI) templates are computed to capture human motion information. The deep structure of PCANet helps to learn hierarchical local motion features from the input ST-MEI templates. The dimension of the feature vector learned from PCANet is reduced using whitening principal component analysis algorithm and finally fed to linear support vector machines classifier for classification. The proposed method is evaluated using three different benchmark datasets, namely KTH, Weizmann, and UCF sports action. Experimental results using leave-one-out strategy demonstrate the effectiveness of the proposed method compared with other state-of-the-art methods.
ISSN:0941-0643
1433-3058
DOI:10.1007/s00521-020-04712-1