Loading…

Autoencoder With Emotion Embedding for Speech Emotion Recognition

An important part of the human-computer interaction process is speech emotion recognition (SER), which has been receiving more attention in recent years. However, although a wide diversity of methods has been proposed in SER, these approaches still cannot improve the performance. A key issue in the...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE access 2021, Vol.9, p.51231-51241
Main Authors:	Zhang, Chenghao, Xue, Lei
Format:	Article
Language:	English
Subjects:	Acoustics Algorithms autoencoder Embedding emotion embedding Emotion recognition Emotions Feature extraction Hidden Markov models instance normalization Noise reduction Performance enhancement Spectrogram Speech emotion recognition Speech recognition
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	An important part of the human-computer interaction process is speech emotion recognition (SER), which has been receiving more attention in recent years. However, although a wide diversity of methods has been proposed in SER, these approaches still cannot improve the performance. A key issue in the low performance of the SER system is how to effectively extract emotion-oriented features. In this paper, we propose a novel algorithm, an autoencoder with emotion embedding, to extract deep emotion features. Unlike many previous works, instance normalization, which is a common technique in the style transfer field, is introduced into our model rather than batch normalization. Furthermore, the emotion embedding path in our method can lead the autoencoder to efficiently learn a priori knowledge from the label. It can enable the model to distinguish which features are most related to human emotion. We concatenate the latent representation learned by the autoencoder and acoustic features obtained by the openSMILE toolkit. Finally, the concatenated feature vector is utilized for emotion classification. To improve the generalization of our method, a simple data augmentation approach is applied. Two publicly available and highly popular databases, IEMOCAP and EMODB, are chosen to evaluate our method. Experimental results demonstrate that the proposed model achieves significant performance improvement compared to other speech emotion recognition systems.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2021.3069818