Loading…

Articulatory Feature-Based Methods for Acoustic and Audio-Visual Speech Recognition: Summary from the 2006 JHU Summer workshop

We report on investigations, conducted at the 2006 Johns Hopkins Workshop, into the use of articulatory features (AFs) for observation and pronunciation models in speech recognition. In the area of observation modeling, we use the outputs of AF classifiers both directly, in an extension of hybrid HM...

Full description

Saved in:
Bibliographic Details
Main Authors: Livescu, Karen, Cetin, Ozgur, Hasegawa-Johnson, Mark, King, Simon, Bartels, Chris, Borges, Nash, Kantor, Arthur, Lal, Partha, Yung, Lisa, Bezman, Ari, Dawson-Haggerty, Stephen, Woods, Bronwyn, Frankel, Joe, Magimai-Doss, Mathew, Saenko, Kate
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We report on investigations, conducted at the 2006 Johns Hopkins Workshop, into the use of articulatory features (AFs) for observation and pronunciation models in speech recognition. In the area of observation modeling, we use the outputs of AF classifiers both directly, in an extension of hybrid HMM/neural network models, and as part of the observation vector, an extension of the "tandem" approach. In the area of pronunciation modeling, we investigate a model having multiple streams of AF states with soft synchrony constraints, for both audio-only and audio-visual recognition. The models are implemented as dynamic Bayesian networks, and tested on tasks from the small-vocabulary switchboard (SVitchboard) corpus and the CUAVE audio-visual digits corpus. Finally, we analyze AF classification and forced alignment using a newly collected set of feature-level manual transcriptions.
ISSN:1520-6149
2379-190X
DOI:10.1109/ICASSP.2007.366989