Loading…

Signgraph: An Efficient and Accurate Pose-Based Graph Convolution Approach Toward Sign Language Recognition

Sign language recognition (SLR) enables the deaf and speech-impaired community to integrate and communicate effectively with the rest of society. Word level or isolated SLR is a fundamental yet complex task with the main objective of using models to correctly recognize signed words. Sign language co...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2023, Vol.11, p.19135-19147
Main Authors: Naz, Neelma, Sajid, Hasan, Ali, Sara, Hasan, Osman, Ehsan, Muhammad Khurram
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Sign language recognition (SLR) enables the deaf and speech-impaired community to integrate and communicate effectively with the rest of society. Word level or isolated SLR is a fundamental yet complex task with the main objective of using models to correctly recognize signed words. Sign language consists of very fast and complex hand, body, face movements, and mouthing cues that make the task very challenging. Several input modalities; RGB, optical Flow, RGB-D, and pose/skeleton have been proposed for SLR. However, the complexity of these modalities and the state-of-the-art (SOTA) methodologies tend to be exceedingly sophisticated and over-parametrized. In this paper, our focus is to use the hands and body poses as an input modality. One major problem in pose-based SLR is extracting the most valuable and distinctive features for all skeleton joints. In this regard, we propose an accurate, efficient, and lightweight pose-based pipeline leveraging a graph convolution network (GCN) along with residual connections and a bottleneck structure. The proposed architecture not only facilitates efficient learning during model training providing significantly improved accuracy scores but also alleviates computational complexity. With the proposed architecture in place, we are able to achieve improved accuracies on three different subsets of the WLASL dataset and the LSA-64 dataset. Our proposed model outperforms previous SOTA pose-based methods by providing a relative improvement of 8.91%, 27.62%, and 26.97% for WLASL-100, WLASL-300, and WLASL-1000 subsets. Moreover, our proposed model also outperforms previous SOTA appearance-based methods by providing a relative improvement of 2.65% and 5.15% for WLASL-300 and WLASL-1000 subsets. For the LSA-64 dataset, our model is able to achieve 100% test recognition accuracy. We are able to achieve this improved performance with far less computational cost as compared to existing appearance-based methods.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2023.3247761