Loading…

Fuzzy Phoneme Classification Using Multi-speaker Vocal Tract Length Normalization

The overall success of automatic speech recognition (ASR) depends on efficient phoneme recognition performance and quality of speech signal received in ASR. However, dissimilar inputs of speakers affect the overall recognition performance. One of the main problems that affect recognition performance...

Full description

Saved in:
Bibliographic Details
Published in:Technical review - IETE 2014-03, Vol.31 (2), p.128-136
Main Authors: Lung, Jensen Wong Jing, Salam, Md. Sah Hj, Rehman, Amjad, Rahim, Mohd Shafry Mohd, Saba, Tanzila
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The overall success of automatic speech recognition (ASR) depends on efficient phoneme recognition performance and quality of speech signal received in ASR. However, dissimilar inputs of speakers affect the overall recognition performance. One of the main problems that affect recognition performance is inter-speaker variability. Vocal tract length normalization (VTLN) is introduced to compensate inter-speaker variation on the speaker signal by applying speaker-specific warping of the frequency scale of a filter bank. Instead of measuring the performance on word level with speaker-specific warping, this research focuses on direct tackling at the phoneme level and applying VTLN on all speakers' speech signals to analyse the best setting for the highest recognition performance. This research seeks to compare each phoneme recognition results from warping factor between 0.74 and 1.54 with 0.02 increments on nine different ranges of frequency warping boundary. The warp factor and frequency warping range that provides the highest phoneme recognition performance is applied on word recognition. The results show an improved performance in phoneme recognition by 0.7% and spoken word recognition by 0.5% using warp factor of 1.40 on frequency range of 300-5000 Hz in comparison to baseline results.
ISSN:0256-4602
0974-5971
DOI:10.1080/02564602.2014.892669