Loading…

GSAS: Enhancing efficiency of human activity recognition using GRU based Sub-activity stitching

Recognition of human activities requires the design of highly efficient models for key-frame selection, image segmentation, feature extraction and selection, sub-activity classification, and sub-activity stitching. Due to the advent of effective frame selection and segmentation methods, most of the...

Full description

Saved in:
Bibliographic Details
Main Authors: Deotale, Disha, Verma, Madhushi, Suresh, P.
Format: Conference Proceeding
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Recognition of human activities requires the design of highly efficient models for key-frame selection, image segmentation, feature extraction and selection, sub-activity classification, and sub-activity stitching. Due to the advent of effective frame selection and segmentation methods, most of the research in human activity recognition (HAR) is focused on feature extraction and its post-processive classification. Most of these models work on the principle of direct activity recognition, wherein the entire video sequence is scanned, and the final activity is recognized from it. In this approach has some lacunas, which include, limited accuracy due to classification of a large number of frames; limited scalability due to large delays for training and evaluation, and limited applicability due to the requirement of large datasets for model training and validation. To remove these drawbacks, this text proposes the design of a novel gated recurrent unit (GRU) based sub-activity stitching model (GSAS). The proposed model works in 2 phases, which include, sub-activity recognition using GRU, followed by sub-activity stitching using a 1D convolutional neural network (CNN). The former converts input frames into numerical sub-activity instances, while the latter processes these instances and converts them into final activity types. Due to the use of this 2-layered architecture, the proposed model can achieve a precision of 89%, a recall of 85%, and an accuracy of 90% which is higher than most of the recently proposed HAR models. The evaluation was done on untrimmed datasets, which makes the proposed model applicable to real-time video sequences. This text also recommends some transfer-learning approaches that can be applied to the proposed model to further improve its performance in terms of scalability and accuracy.
ISSN:2214-7853
2214-7853
DOI:10.1016/j.matpr.2022.03.071