Loading…

Language model estimation for optimizing end-to-end performance of a natural language call routing system

Conventional methods for training statistical models for automatic speech recognition, such as acoustic and language models, have focused on criteria such as maximum likelihood and sentence or word error rate (WER). However, unlike dictation systems, the goal for spoken dialogue systems is to unders...

Full description

Saved in:

Bibliographic Details
Main Authors:	Goel, V., Kuo, H.-K.J., Deligne, S., Cheng Wu
Format:	Conference Proceeding
Language:	English
Subjects:	Automatic speech recognition Error analysis Maximum likelihood estimation Natural languages Routing Speech recognition System performance Telephony Training data Unsupervised learning
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Conventional methods for training statistical models for automatic speech recognition, such as acoustic and language models, have focused on criteria such as maximum likelihood and sentence or word error rate (WER). However, unlike dictation systems, the goal for spoken dialogue systems is to understand the meaning of what a person says, not to get every word correctly transcribed. For such systems, we propose to optimize the statistical models under end-to-end system performance criteria. We illustrate this principle by focusing on the estimation of the language model (LM) component of a natural language call routing system. This estimation, carried out under a conditional maximum likelihood objective, aims at optimizing the call routing (classification) accuracy, which is often the criterion of interest in these systems. LM updates are derived using the extended Baum-Welch procedure (Gopalakrishnan et al. (1991)). In our experiments, we find that our estimation procedure leads to a small but promising gain in classification accuracy. Interestingly, the estimated language models also lead to an increase in the word error rate while improving the classification accuracy, showing that the system with the best classification accuracy is not necessarily the one with the lowest WER. Significantly, our LM estimation procedure does not require the correct transcription of the training data, and can therefore be applied to unsupervised learning from untranscribed speech data.
ISSN:	1520-6149 2379-190X
DOI:	10.1109/ICASSP.2005.1415176