Loading…

Maximum-likelihood nonlinear transformation for acoustic adaptation

In this paper, we describe an adaptation method for speech recognition systems that is based on a nonlinear transformation of the feature space. In contrast to most existing adaptation methods which assume some form of affine transformation of either the feature vectors or the acoustic models that m...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on speech and audio processing 2004-11, Vol.12 (6), p.572-578
Main Authors:	Padmanabhan, M., Dharanipragada, S.
Format:	Article
Language:	English
Subjects:	Acoustic applications Adaptation Applied sciences Degradation Error analysis Exact sciences and technology Information, signal and communications theory Mathematical analysis Mathematical models Maximum likelihood estimation Methods Multidimensional systems Nonlinear acoustics Nonlinearity Phonetics Runtime Signal processing Speech processing Speech recognition Studies Telecommunications and information theory Testing Training data Transformations Vectors (mathematics)
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In this paper, we describe an adaptation method for speech recognition systems that is based on a nonlinear transformation of the feature space. In contrast to most existing adaptation methods which assume some form of affine transformation of either the feature vectors or the acoustic models that model the feature vectors, our proposed method composes a general nonlinear transformation from two transformations, one of these being an affine transformation that combines the dimensions of the original feature space, and the other being a nonlinear transformation that is applied independently to each dimension of the previously transformed feature space leading to a general multidimensional nonlinear transformation of the original feature space. This method also differs from other affine techniques in the way the parameters of the transform are shared. In most previous methods, the parameters of the transformation are shared on the basis of the phonetic class, in our method, the parameters of the nonlinear transformation are shared not on the basis of the phonetic class, but rather on the location in the feature space. Experimental results show that the method outperforms affine methods providing up to a 25% relative improvement in the word error rate in an in-car speech recognition task.
ISSN:	1063-6676 2329-9290 1558-2353 2329-9304
DOI:	10.1109/TSA.2003.822629