Loading…

Comparison of speaker normalization techniques for classification of emotionally disturbed subjects based on voice

When reviewing his clinical experience in treating suicidal patients, one of the authors observed that successful predictions of suicidality were often based on the patients voice independent of content. Research has shown that the Gaussian mixture model of the mel-cepstral features of speech can be...

Full description

Saved in:
Bibliographic Details
Main Authors: Subari, K S, Wilkes, D M, Shiavi, R G, Silverman, S E, Silverman, M K
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:When reviewing his clinical experience in treating suicidal patients, one of the authors observed that successful predictions of suicidality were often based on the patients voice independent of content. Research has shown that the Gaussian mixture model of the mel-cepstral features of speech can be used to distinguish the speech of suicidal persons from that of depressed and control persons with high classification rates. Since the vocal tract length vary from person to person, can the classification rates of suicidal persons be improved through speaker normalization? We approach this problem by warping the frequency axis of the mel-cepstral features. The results show that two different approaches yielded the best results: i) by using the maximum-likelihood approach in a gender-independent database to compute the warping factor for a nonlinear warp and ii) by a transformation of the first three formants in a gender-dependent database to compute the warping factor for a linear warp.
DOI:10.1109/IECBES.2010.5742248