Loading…

Using Lip Reading Recognition to Predict Daily Mandarin Conversation

Audio-based automatic speech recognition as a hearing aid is susceptible to background noise and overlapping speeches. Consequently, audio-visual speech recognition has been developed to complement the audio input with additional visual information. However, the huge improvement of neural networks i...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2022, Vol.10, p.53481-53489
Main Authors: Haq, Muhamad Amirul, Ruan, Shanq-Jang, Cai, Wen-Jie, Li, Lieber Po-Hung
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Audio-based automatic speech recognition as a hearing aid is susceptible to background noise and overlapping speeches. Consequently, audio-visual speech recognition has been developed to complement the audio input with additional visual information. However, the huge improvement of neural networks in the visual task has resulted in a robust and reliable lip reading framework that can recognize speech from visual input alone. In this work, we propose a lip reading recognition model to predict daily Mandarin conversation and collect a new Daily Mandarin Conversation Lip Reading (DMCLR) dataset, consisting of 1,000 videos from 100 daily conversations spoken by ten speakers. Our model consists of a spatiotemporal convolution layer, a SE-ResNet-18 network, and a back-end module consisting of bi-directional gated recurrent unit (Bi-GRU), 1D convolution, and fully-connected layers. This model is able to reach 94.2% of accuracy in the DMCLR dataset. Such performance makes it possible for Mandarin lip reading applications to be practical in real life. Additionally, we are able to achieve 86.6% and 57.2% accuracy on Lip Reading in the Wild (LRW) and LRW-1000 (Mandarin), respectively. The results show that our method achieves state-of-the-art performance on these two challenging datasets.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2022.3175867