Loading…

Improved Speech Recognition using Acoustic and Lexical Correlates of Pitch Accent in a N-Best Rescoring Framework

Most statistical speech recognition systems make use of segment-level features, derived mainly from spectral envelope characteristics of the signal, but ignore supra-segmental cues that carry additional information likely to be useful for speech recognition. These cues, which constitute the prosody...

Full description

Saved in:
Bibliographic Details
Main Authors: Ananthakrishnan, S., Narayanan, S.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Most statistical speech recognition systems make use of segment-level features, derived mainly from spectral envelope characteristics of the signal, but ignore supra-segmental cues that carry additional information likely to be useful for speech recognition. These cues, which constitute the prosody of the utterance and occur at the syllable, word and utterance level, are closely related to the lexical and syntactic organization of the utterance. In this paper, we explore the use of acoustic and lexical correlates of a subset of these cues in order to improve recognition performance on a read-speech corpus, using word error rate (WER) as the metric. Using the features and methods described in this paper, we were able to obtain a relative WER improvement of 1.3% over a baseline ASR system on the Boston University Radio News Corpus.
ISSN:1520-6149
2379-190X
DOI:10.1109/ICASSP.2007.367209