Loading…
Deep-learning-based real-time silent speech recognition using facial electromyogram recorded around eyes for hands-free interfacing in a virtual reality environment
Speech recognition technology is a promising hands-free interfacing modality for virtual reality (VR) applications. However, it has several drawbacks, such as limited usability in a noisy environment or a public place and limited accessibility to those who cannot generate loud and clear voices. Thes...
Saved in:
Published in: | Virtual reality : the journal of the Virtual Reality Society 2022-09, Vol.26 (3), p.1047-1057 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Speech recognition technology is a promising hands-free interfacing modality for virtual reality (VR) applications. However, it has several drawbacks, such as limited usability in a noisy environment or a public place and limited accessibility to those who cannot generate loud and clear voices. These limitations may be overcome by employing a silent speech recognition (SSR) technology utilizing facial electromyograms (fEMGs) in a VR environment. In the conventional SSR systems, however, fEMG electrodes were attached around the user’s lips and neck, thereby creating new practical issues, such as the requirement of an additional wearable system besides the VR headset, necessity of a complex and time-consuming procedure for attaching the fEMG electrodes, and discomfort and limited facial muscle movements of the user. To solve these problems, we propose an SSR system using fEMGs measured by a few electrodes attached around the eyes of a user, which can also be easily incorporated into available VR headsets. To enhance the accuracy of classifying the fEMG signals recorded from limited recording locations relatively far from the phonatory organs, a deep neural network-based classification method was developed using similar fEMG data previously collected from other individuals and then transformed by dynamic positional warping. In the experiments, the proposed SSR system could classify six different fEMG patterns generated by six silently spoken words with an accuracy of 92.53%. To further demonstrate that our SSR system can be used as a hands-free control interface in practical VR applications, an online SSR system was implemented. |
---|---|
ISSN: | 1359-4338 1434-9957 |
DOI: | 10.1007/s10055-021-00616-0 |