Loading…

Toward optimizing stream fusion in multistream recognition of speech

A multistream phoneme recognition framework is proposed based on forming streams from different spectrotemporal modulations of speech. Phoneme posterior probabilities were estimated from each stream separately and combined at the output level. A statistical model of the final estimated posterior pro...

Full description

Saved in:
Bibliographic Details
Published in:The Journal of the Acoustical Society of America 2011-07, Vol.130 (1), p.EL14-EL18
Main Authors: Mesgarani, Nima, Thomas, Samuel, Hermansky, Hynek
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A multistream phoneme recognition framework is proposed based on forming streams from different spectrotemporal modulations of speech. Phoneme posterior probabilities were estimated from each stream separately and combined at the output level. A statistical model of the final estimated posterior probabilities is used to characterize the system performance. During the operation, the best fusion architecture is chosen automatically to maximize the similarity of output statistics to clean condition. Results on phoneme recognition from noisy speech indicate the effectiveness of the proposed method.
ISSN:0001-4966
1520-8524
DOI:10.1121/1.3595744