Loading…

Inferring clinical depression from speech and spoken utterances

In this paper, we investigate the problem of detecting depression from recordings of subjects' speech using speech processing and machine learning. There has been considerable interest in this problem in recent years due to the potential for developing objective assessments from real-world beha...

Full description

Saved in:

Bibliographic Details
Published in:	2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP) 2014-09, p.1-5
Main Authors:	Asgari, Meysam, Shafran, Izhak, Sheeber, Lisa B.
Format:	Article
Language:	English
Subjects:	Abstracts Depression Speech Speech analysis Telemedicine
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In this paper, we investigate the problem of detecting depression from recordings of subjects' speech using speech processing and machine learning. There has been considerable interest in this problem in recent years due to the potential for developing objective assessments from real-world behaviors, which may provide valuable supplementary clinical information or may be useful in screening. The cues for depression may be present in "what is said" (content) and "how it is said" (prosody). Given the limited amounts of text data, even in this relatively large study, it is difficult to employ standard method of learning models from n-gram features. Instead, we learn models using word representations in an alternative feature space of valence and arousal. This is akin to embedding words into a real vector space albeit with manual ratings instead of those learned with deep neural networks [1]. For extracting prosody, we employ standard feature extractors such as those implemented in openSMILE and compare them with features extracted from harmonic models that we have been developing in recent years. Our experiments show that our features from harmonic model improve the performance of detecting depression from spoken utterances than other alternatives. The context features provide additional improvements to achieve an accuracy of about 74%, sufficient to be useful in screening applications.
ISSN:	1551-2541 2378-928X
DOI:	10.1109/MLSP.2014.6958856