Loading…

Automatic Elicitation Compliance for Short-Duration Speech Based Depression Detection

Detecting depression from the voice in naturalistic environments is challenging, particularly for short-duration audio recordings. This enhances the need to interpret and make optimal use of elicited speech. The rapid consonant-vowel syllable combination 'pataka' has frequently been select...

Full description

Saved in:
Bibliographic Details
Main Authors: Stasak, Brian, Huang, Zhaocheng, Joachim, Dale, Epps, Julien
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Detecting depression from the voice in naturalistic environments is challenging, particularly for short-duration audio recordings. This enhances the need to interpret and make optimal use of elicited speech. The rapid consonant-vowel syllable combination 'pataka' has frequently been selected as a clinical motor-speech task. However, there is significant variability in elicited recordings, which remains to be investigated. In this multi-corpus study of over 25,000 'pataka' utterances, it was discovered that speech landmark- based features were sensitive to the number of 'pataka' utterances per recording. This landmark feature sensitivity was newly exploited to automatically estimate 'pataka' count and rate, achieving root mean square errors nearly three times lower than chance-level. Leveraging count-rate knowledge of the elicited speech for depression detection, results show that the estimated 'pataka' number and rate are important for normalizing evaluative 'pataka' speech data. Count and/or rate normalized 'pataka' models produced relative reductions in depression classification error of up to 26% compared with non-normalized models.
ISSN:2379-190X
DOI:10.1109/ICASSP39728.2021.9414366