Loading…
A maximum-likelihood approach to stochastic matching for robust speech recognition
Presents a maximum-likelihood (ML) stochastic matching approach to decrease the acoustic mismatch between a test utterance and a given set of speech models so as to reduce the recognition performance degradation caused by distortions in the test utterance and/or the model set. We assume that the spe...
Saved in:
Published in: | IEEE transactions on speech and audio processing 1996-05, Vol.4 (3), p.190-202 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Presents a maximum-likelihood (ML) stochastic matching approach to decrease the acoustic mismatch between a test utterance and a given set of speech models so as to reduce the recognition performance degradation caused by distortions in the test utterance and/or the model set. We assume that the speech signal is modeled by a set of subword hidden Markov models (HMM) /spl Lambda//sub x/. The mismatch between the observed test utterance Y and the models /spl Lambda//sub x/ can be reduced in two ways: 1) by an inverse distortion function F/sub /spl nu//(.) that maps Y into an utterance X that matches better with the models /spl Lambda//sub x/ and 2) by a model transformation function G/sub /spl eta//(.) that maps /spl Lambda//sub x/ to the transformed model /spl Lambda//sub x/ that matches better with the utterance Y. We assume the functional form of the transformations F/sub /spl nu//(.) or G/sub /spl eta//(.) and estimate the parameters /spl nu/ or /spl eta/ in a ML manner using the expectation-maximization (EM) algorithm. The choice of the form of F/sub /spl nu//(.) or G/sub /spl eta//(.) is based on prior knowledge of the nature of the acoustic mismatch. The stochastic matching algorithm operates only on the given test utterance and the given set of speech models, and no additional training data is required for the estimation of the mismatch prior to actual testing. Experimental results are presented to study the properties of the proposed algorithm and to verify the efficacy of the approach in improving the performance of a HMM-based continuous speech recognition system in the presence of mismatch due to different transducers and transmission channels. |
---|---|
ISSN: | 1063-6676 1558-2353 |
DOI: | 10.1109/89.496215 |