Loading…

Quantized depth image and skeleton-based multimodal dynamic hand gesture recognition

An existing approach to dynamic hand gesture recognition is to use multimodal-fusion CRNN (Convolutional Recurrent Neural Networks) on depth images and corresponding 2D hand skeleton coordinates. However, an underlying problem in this method is that raw depth images possess a very low contrast in th...

Full description

Saved in:

Bibliographic Details
Published in:	The Visual computer 2024, Vol.40 (1), p.11-25
Main Authors:	Mahmud, Hasan, Morshed, Mashrur M., Hasan, Md. Kamrul
Format:	Article
Language:	English
Subjects:	Artificial Intelligence Computer Graphics Computer Science Image Processing and Computer Vision Original Article
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	An existing approach to dynamic hand gesture recognition is to use multimodal-fusion CRNN (Convolutional Recurrent Neural Networks) on depth images and corresponding 2D hand skeleton coordinates. However, an underlying problem in this method is that raw depth images possess a very low contrast in the hand ROI (region of interest). They do not highlight the details which are important to fine-grained hand gesture recognition details such as finger orientation, the overlap between the fingers and the palm, or overlap between multiple fingers. To address this issue, we propose generating quantized depth images as an alternative input modality to raw depth images. This creates sharp relative contrasts between key parts of the hand, which improves gesture recognition performance. In addition, we explore some ways to tackle the high variance problem in previously researched multimodal-fusion CRNN architectures. We obtained accuracies of 90.82 and 89.21% (14 and 28 gestures, respectively) on the DHG-14/28 dataset and accuracies of 93.81 and 90.24% (14 and 28 gestures, respectively) on the SHREC-2017 dataset, which is a significant improvement over previous multimodal-dusion CRNNs.
ISSN:	0178-2789 1432-2315
DOI:	10.1007/s00371-022-02762-1