Loading…

Rapid speaker adaptation using compressive sensing

•A new speaker adaptation framework using compressive sensing is proposed.•A speaker dictionary is constructed using all eigenvoices and training speaker models.•Optimal sparse representation of a speaker model is constructed using the dictionary.•Matching pursuit and L1 regularized optimization are...

Full description

Saved in:
Bibliographic Details
Published in:Speech communication 2013-12, Vol.55 (10), p.950-963
Main Authors: Zhang, Wen-Lin, Qu, Dan, Zhang, Wei-Qiang, Li, Bi-Cheng
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•A new speaker adaptation framework using compressive sensing is proposed.•A speaker dictionary is constructed using all eigenvoices and training speaker models.•Optimal sparse representation of a speaker model is constructed using the dictionary.•Matching pursuit and L1 regularized optimization are adapted to solve the problem.•The new methods outperform the conventional ones under all testing conditions. Speaker-space-based speaker adaptation methods can obtain good performance even if the amount of adaptation data is limited. However, it is difficult to determine the optimal dimension and basis vectors of the subspace for a particular unknown speaker. Conventional methods, such as eigenvoice (EV) and reference speaker weighting (RSW), can only obtain a sub-optimal speaker subspace. In this paper, we present a new speaker-space-based speaker adaptation framework using compressive sensing. The mean vectors of all mixture components of a conventional Gaussian-Mixture-Model-Hidden-Markov-Model (GMM-HMM)-based speech recognition system are concatenated to form a supervector. The speaker adaptation problem is viewed as recovering the speaker-dependent supervector from limited speech signal observations. A redundant speaker dictionary is constructed by a combination of all the training speaker supervectors and the supervectors derived from the EV method. Given the adaptation data, the best subspace for a particular speaker is constructed in a maximum a posterior manner by selecting a proper set of items from this dictionary. Two algorithms, i.e. matching pursuit and l1 regularized optimization, are adapted to solve this problem. With an efficient redundant basis vector removal mechanism and an iterative updating of the speaker coordinate, the matching pursuit based speaker adaptation method is fast and efficient. The matching pursuit algorithm is greedy and sub-optimal, while direct optimization of the likelihood of the adaptation data with an explicit l1 regularization term can obtain better approximation of the unknown speaker model. The projected gradient optimization algorithm is adopted and a few iterations of the matching pursuit algorithm can provide a good initial value. Experimental results show that matching pursuit algorithm outperforms the conventional testing methods under all testing conditions. Better performance is obtained when direct l1 regularized optimization is applied. Both methods can select a proper mixed set of the eigenvoice and refer
ISSN:0167-6393
1872-7182
DOI:10.1016/j.specom.2013.06.012