Loading…
MIPA-ResGCN: a multi-input part attention enhanced residual graph convolutional framework for sign language recognition
Sign language (SL) is used as primary mode of communication by individuals who experience deafness and speech disorders. However, SL creates an inordinate communication barrier as most people are not acquainted with it. To solve this problem, many technological solutions using wearable devices, vide...
Saved in:
Published in: | Computers & electrical engineering 2023-12, Vol.112, p.109009, Article 109009 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Sign language (SL) is used as primary mode of communication by individuals who experience deafness and speech disorders. However, SL creates an inordinate communication barrier as most people are not acquainted with it. To solve this problem, many technological solutions using wearable devices, video, and depth cameras have been put forth. The ubiquitous nature of cameras in contemporary devices has resulted in the emergence of sign language recognition (SLR) using video sequence as a viable and unobtrusive substitute. Nonetheless, the utilization of SLR methods based on visual features, commonly known as appearance-based methods, presents notable computational complexities. In response to these challenges, this study introduces an accurate and computationally efficient pose-based approach for SLR. Our proposed approach comprises three key stages: pose extraction, handcrafted feature generation, and feature space mapping and recognition. Initially, an efficient off-the-shelf pose extraction algorithm is employed to extract pose information of various body parts of a subject captured in a video. Then, a multi-input stream has been generated using handcrafted features, i.e., joints, bone lengths, and bone angles. Finally, an efficient and lightweight residual graph convolutional network (ResGCN) along with a novel part attention mechanism, is proposed to encode body's spatial and temporal information in a compact feature space and recognize the signs performed. In addition to enabling effective learning during model training and offering cutting-edge accuracy, the proposed model significantly reduces computational complexity. Our proposed method is assessed on five challenging SL datasets, WLASL-100, WLASL-300, WLASL-1000, LSA-64, and MINDS-Libras, achieving state-of-the-art (SOTA) accuracies of 83.33 %, 72.90 %, 64.92 %, 100± 0 %, and 96.70± 1.07 %, respectively. Compared to previous approaches, we achieve superior performance while incurring a lower computational cost. |
---|---|
ISSN: | 0045-7906 1879-0755 |
DOI: | 10.1016/j.compeleceng.2023.109009 |