Loading…

An experimental study of speech emotion recognition based on deep convolutional neural networks

Speech emotion recognition (SER) is a challenging task since it is unclear what kind of features are able to reflect the characteristics of human emotion from speech. However, traditional feature extractions perform inconsistently for different emotion recognition tasks. Obviously, different spectro...

Full description

Saved in:
Bibliographic Details
Main Authors: Zheng, W. Q., Yu, J. S., Zou, Y. X.
Format: Conference Proceeding
Language:English
Subjects:
Citations: Items that cite this one
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Speech emotion recognition (SER) is a challenging task since it is unclear what kind of features are able to reflect the characteristics of human emotion from speech. However, traditional feature extractions perform inconsistently for different emotion recognition tasks. Obviously, different spectrogram provides information reflecting difference emotion. This paper proposes a systematical approach to implement an effectively emotion recognition system based on deep convolution neural networks (DCNNs) using labeled training audio data. Specifically, the log-spectrogram is computed and the principle component analysis (PCA) technique is used to reduce the dimensionality and suppress the interferences. Then the PCA whitened spectrogram is split into non-overlapping segments. The DCNN is constructed to learn the representation of the emotion from the segments with labeled training speech data. Our preliminary experiments show the proposed emotion recognition system based on DCNNs (containing 2 convolution and 2 pooling layers) achieves about 40% classification accuracy. Moreover, it also outperforms the SVM based classification using the hand-crafted acoustic features.
ISSN:2156-8111
DOI:10.1109/ACII.2015.7344669