Loading…

Time Frequency Representation for Speech Recognition

In the field of speech recognition it has been shown that incorporating the dynamics of speech has increased recognition success. This concept is presented in Mel frequency cepstral coefficients (MFCC) and its derivatives which present both the static and the dynamics of the vocal tract. In this pap...

Full description

Saved in:
Bibliographic Details
Main Authors: Amsalem, A., Shallom, I.D.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In the field of speech recognition it has been shown that incorporating the dynamics of speech has increased recognition success. This concept is presented in Mel frequency cepstral coefficients (MFCC) and its derivatives which present both the static and the dynamics of the vocal tract. In this paper, a new method for capturing the dynamic features of non- stationary speech signals is presented. The proposed approach is based upon the isolation of each cepstral band and projecting it onto orthogonal space, spanned by a set of well defined orthogonal functions. The major idea is to capture and present energy transitions between successive short term speech frames, along a non-stationary segment about 100 ms. Non stationary speech segments have been represented by time-frequency representations (TFR) and the analysis was modified to fit a two dimensional data. The introduced features evaluation conducted on the TIDIGIT corpus revealed an average of 58% improvement in word error rate, compared to MFCC and its derivatives in the context of isolated speech recognition in noisy environments.
DOI:10.1109/ITRE.2006.381542