Loading…

An open-source MP + CNN + BiLSTM model-based hybrid model for recognizing sign language on smartphones

The communication barriers experienced by deaf and hard-of-hearing individuals often lead to social isolation and limited access to essential services, underlining a critical need for effective and accessible solutions. Recognizing the unique challenges this community faces—such as the scarcity of s...

Full description

Saved in:
Bibliographic Details
Published in:International journal of system assurance engineering and management 2024-08, Vol.15 (8), p.3794-3806
Main Authors: Ghanimi, Hayder M. A., Sengan, Sudhakar, Sadu, Vijaya Bhaskar, Kaur, Parvinder, Kaushik, Manju, Alroobaea, Roobaea, Baqasah, Abdullah M., Alsafyani, Majed, Dadheech, Pankaj
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The communication barriers experienced by deaf and hard-of-hearing individuals often lead to social isolation and limited access to essential services, underlining a critical need for effective and accessible solutions. Recognizing the unique challenges this community faces—such as the scarcity of sign language interpreters, particularly in remote areas, and the lack of real-time translation tools. This paper proposes the development of a smartphone-runnable sign language recognition model to address the communication problems faced by deaf and hard-of-hearing persons. This proposed model combines Mediapipe hand tracking with particle filtering (PF) to accurately detect and track hand movements, and a convolutional neural network (CNN) and bidirectional long short-term memory based gesture recognition model to model the temporal dynamics of Sign Language gestures. These models use a small number of layers and filters, depthwise separable convolutions, and dropout layers to minimize the computational costs and prevent overfitting, making them suitable for smartphone implementation. This article discusses the existing challenges handled by the deaf and hard-of-hearing community and explains how the proposed model could help overcome these challenges. A MediaPipe + PF model performs feature extraction from the image and data preprocessing. During training, with fewer activation functions and parameters, this proposed model performed better to other CNN with RNN variant models (CNN + LSTM, CNN + GRU) used in the experiments of convergence speed and learning efficiency.
ISSN:0975-6809
0976-4348
DOI:10.1007/s13198-024-02376-x