Loading…

Evaluating deep learning architectures for Speech Emotion Recognition

Speech Emotion Recognition (SER) can be regarded as a static or dynamic classification problem, which makes SER an excellent test bed for investigating and comparing various deep learning architectures. We describe a frame-based formulation to SER that relies on minimal speech processing and end-to-...

Full description

Saved in:

Bibliographic Details
Published in:	Neural networks 2017-08, Vol.92, p.60-68
Main Authors:	Fayek, Haytham M., Lech, Margaret, Cavedon, Lawrence
Format:	Article
Language:	English
Subjects:	Affective computing Deep learning Emotion recognition Emotions Machine Learning Neural networks Neural Networks (Computer) Speech recognition Speech Recognition Software
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Speech Emotion Recognition (SER) can be regarded as a static or dynamic classification problem, which makes SER an excellent test bed for investigating and comparing various deep learning architectures. We describe a frame-based formulation to SER that relies on minimal speech processing and end-to-end deep learning to model intra-utterance dynamics. We use the proposed SER system to empirically explore feed-forward and recurrent neural network architectures and their variants. Experiments conducted illuminate the advantages and limitations of these architectures in paralinguistic speech recognition and emotion recognition in particular. As a result of our exploration, we report state-of-the-art results on the IEMOCAP database for speaker-independent SER and present quantitative and qualitative assessments of the models’ performances.
ISSN:	0893-6080 1879-2782
DOI:	10.1016/j.neunet.2017.02.013