Loading…

Sparse modeling of neural network posterior probabilities for exemplar-based speech recognition

•Automatic speech recognition can be cast as a realization of compressive sensing.•Posterior probabilities are suitable features for exemplar-based sparse modeling.•Posterior-based sparse representation meets statistical speech recognition formalism.•Dictionary learning reduces collection size of ex...

Full description

Saved in:

Bibliographic Details
Published in:	Speech communication 2016-02, Vol.76, p.230-244
Main Authors:	Dighe, Pranay, Asaei, Afsaneh, Bourlard, Hervé
Format:	Article
Language:	English
Subjects:	Automatic speech recognition Compressive sensing Deep neural network posterior features Dictionary learning Sparse modeling Sparse word posterior probabilities
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	•Automatic speech recognition can be cast as a realization of compressive sensing.•Posterior probabilities are suitable features for exemplar-based sparse modeling.•Posterior-based sparse representation meets statistical speech recognition formalism.•Dictionary learning reduces collection size of exemplars and improves the performance.•Collaborative hierarchical sparsity exploits temporal information in continuous speech. In this paper, a compressive sensing (CS) perspective to exemplar-based speech processing is proposed. Relying on an analytical relationship between CS formulation and statistical speech recognition (Hidden Markov Models – HMM), the automatic speech recognition (ASR) problem is cast as recovery of high-dimensional sparse word representation from the observed low-dimensional acoustic features. The acoustic features are exemplars obtained from (deep) neural network sub-word conditional posterior probabilities. Low-dimensional word manifolds are learned using these sub-word posterior exemplars and exploited to construct a linguistic dictionary for sparse representation of word posteriors. Dictionary learning has been found to be a principled way to alleviate the need of having huge collection of exemplars as required in conventional exemplar-based approaches, while still improving the performance. Context appending and collaborative hierarchical sparsity are used to exploit the sequential and group structure underlying word sparse representation. This formulation leads to a posterior-based sparse modeling approach to speech recognition. The potential of the proposed approach is demonstrated on isolated word (Phonebook corpus) and continuous speech (Numbers corpus) recognition tasks.
ISSN:	0167-6393 1872-7182
DOI:	10.1016/j.specom.2015.06.002