Loading…

A comparison of model validation techniques for audio-visual speech recognition

This paper implements and compares the performance of a number of techniques proposed for improving the accuracy of Automatic Speech Recognition (ASR) systems. As ASR that uses only speech can be contaminated by environmental noise, in some applications it may improve performance to employ Audio-Vis...

Full description

Saved in:
Bibliographic Details
Main Authors: Thum W. Seong, M.Z. Ibrahim, Nurul W. Arshad, David Mulvaney
Format: Default Conference proceeding
Published: 2017
Subjects:
Online Access:https://hdl.handle.net/2134/27016
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1822178488324456448
author Thum W. Seong
M.Z. Ibrahim
Nurul W. Arshad
David Mulvaney
author_facet Thum W. Seong
M.Z. Ibrahim
Nurul W. Arshad
David Mulvaney
author_sort Thum W. Seong (7208510)
collection Figshare
description This paper implements and compares the performance of a number of techniques proposed for improving the accuracy of Automatic Speech Recognition (ASR) systems. As ASR that uses only speech can be contaminated by environmental noise, in some applications it may improve performance to employ Audio-Visual Speech Recognition (AVSR), in which recognition uses both audio information and mouth movements obtained from a video recording of the speaker’s face region. In this paper, model validation techniques, namely the holdout method, leave-one-out cross validation and bootstrap validation, are implemented to validate the performance of an AVSR system as well as to provide a comparison of the performance of the validation techniques themselves. A new speech data corpus is used, namely the Loughborough University Audio-Visual (LUNA-V) dataset that contains 10 speakers with five sets of samples uttered by each speaker. The database is divided into training and testing sets and processed in manners suitable for the validation techniques under investigation. The performance is evaluated using a range of different signal-to-noise ratio values using a variety of noise types obtained from the NOISEX-92 dataset.
format Default
Conference proceeding
id rr-article-9552131
institution Loughborough University
publishDate 2017
record_format Figshare
spelling rr-article-95521312017-01-01T00:00:00Z A comparison of model validation techniques for audio-visual speech recognition Thum W. Seong (7208510) M.Z. Ibrahim (7204967) Nurul W. Arshad (7208513) David Mulvaney (1252071) Mechanical engineering not elsewhere classified Audio-visual speech recognition Hidden markov models HTK toolkit Holdout validation Leave-one-out cross validation Bootstrap validation Mechanical Engineering not elsewhere classified This paper implements and compares the performance of a number of techniques proposed for improving the accuracy of Automatic Speech Recognition (ASR) systems. As ASR that uses only speech can be contaminated by environmental noise, in some applications it may improve performance to employ Audio-Visual Speech Recognition (AVSR), in which recognition uses both audio information and mouth movements obtained from a video recording of the speaker’s face region. In this paper, model validation techniques, namely the holdout method, leave-one-out cross validation and bootstrap validation, are implemented to validate the performance of an AVSR system as well as to provide a comparison of the performance of the validation techniques themselves. A new speech data corpus is used, namely the Loughborough University Audio-Visual (LUNA-V) dataset that contains 10 speakers with five sets of samples uttered by each speaker. The database is divided into training and testing sets and processed in manners suitable for the validation techniques under investigation. The performance is evaluated using a range of different signal-to-noise ratio values using a variety of noise types obtained from the NOISEX-92 dataset. 2017-01-01T00:00:00Z Text Conference contribution 2134/27016 https://figshare.com/articles/conference_contribution/A_comparison_of_model_validation_techniques_for_audio-visual_speech_recognition/9552131 CC BY-NC-ND 4.0
spellingShingle Mechanical engineering not elsewhere classified
Audio-visual speech recognition
Hidden markov models
HTK toolkit
Holdout validation
Leave-one-out cross validation
Bootstrap validation
Mechanical Engineering not elsewhere classified
Thum W. Seong
M.Z. Ibrahim
Nurul W. Arshad
David Mulvaney
A comparison of model validation techniques for audio-visual speech recognition
title A comparison of model validation techniques for audio-visual speech recognition
title_full A comparison of model validation techniques for audio-visual speech recognition
title_fullStr A comparison of model validation techniques for audio-visual speech recognition
title_full_unstemmed A comparison of model validation techniques for audio-visual speech recognition
title_short A comparison of model validation techniques for audio-visual speech recognition
title_sort comparison of model validation techniques for audio-visual speech recognition
topic Mechanical engineering not elsewhere classified
Audio-visual speech recognition
Hidden markov models
HTK toolkit
Holdout validation
Leave-one-out cross validation
Bootstrap validation
Mechanical Engineering not elsewhere classified
url https://hdl.handle.net/2134/27016