Loading…

Statistical Utterance Comparison for Speaker Clustering Using Factor Analysis

We propose a novel method of measuring the similarity between two or more speech utterances for speaker clustering, based on probability theory and factor analysis. The similarity function is formulated as the probability that the utterances originated from the same speaker, and uses statistical eig...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on audio, speech, and language processing speech, and language processing, 2012-11, Vol.20 (9), p.2482-2491
Main Authors: Jeon, Woojay, Changxue Ma, Macho, Dusan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We propose a novel method of measuring the similarity between two or more speech utterances for speaker clustering, based on probability theory and factor analysis. The similarity function is formulated as the probability that the utterances originated from the same speaker, and uses statistical eigenvoice and eigenchannel models to incorporate physical knowledge of interspeaker and intraspeaker variabilities, allowing the similarity function to be trainable and robust. The comparison function can be efficiently computed using a compact set of sufficient statistics for each speech utterance, allowing the acoustic features to be discarded. We begin using only eigenvoices, and then show how the eigenchannels can be incorporated into the equation to result in an identical form but with a different set of sufficient statistics. We test the proposed model in a speaker clustering task using the CALLHOME telephone conversation corpus and show that it performs better than two other well-known similarity measures: the Cross-Likelihood Ratio (CLR) and Generalized Likelihood Ratio (GLR).
ISSN:1558-7916
2329-9290
1558-7924
2329-9304
DOI:10.1109/TASL.2012.2204050