Loading…
Comparison of speaker normalization techniques for classification of emotionally disturbed subjects based on voice
When reviewing his clinical experience in treating suicidal patients, one of the authors observed that successful predictions of suicidality were often based on the patients voice independent of content. Research has shown that the Gaussian mixture model of the mel-cepstral features of speech can be...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | When reviewing his clinical experience in treating suicidal patients, one of the authors observed that successful predictions of suicidality were often based on the patients voice independent of content. Research has shown that the Gaussian mixture model of the mel-cepstral features of speech can be used to distinguish the speech of suicidal persons from that of depressed and control persons with high classification rates. Since the vocal tract length vary from person to person, can the classification rates of suicidal persons be improved through speaker normalization? We approach this problem by warping the frequency axis of the mel-cepstral features. The results show that two different approaches yielded the best results: i) by using the maximum-likelihood approach in a gender-independent database to compute the warping factor for a nonlinear warp and ii) by a transformation of the first three formants in a gender-dependent database to compute the warping factor for a linear warp. |
---|---|
DOI: | 10.1109/IECBES.2010.5742248 |