Loading…
A 3D Tensor Representation of Speech and 3D Convolutional Neural Network for Emotion Recognition
Speech emotion recognition (SER) is one of the most important approaches for emotional human–computer interaction. Hence, in recent years, SER has received a great deal of attention from many researchers and has become one of the most interesting and challenging topics. In most SER systems, the extr...
Saved in:
Published in: | Circuits, systems, and signal processing systems, and signal processing, 2023-07, Vol.42 (7), p.4271-4291 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Speech emotion recognition (SER) is one of the most important approaches for emotional human–computer interaction. Hence, in recent years, SER has received a great deal of attention from many researchers and has become one of the most interesting and challenging topics. In most SER systems, the extraction of ineffective features is the most challenging problem. Thus, in this paper, a novel emotion recognition system was proposed based on the reconstructed phase space of speech. In this method, the three-dimensional reconstructed phase space of the speech signals was calculated. Consequently, emotion-related patterns formed in this space were converted into 3D tensors. Then, a three-dimensional convolutional neural network was employed to analyse the patterns and classify the corresponding emotions. Results of the experiments on the two public datasets including Berlin EMO-DB and eNTERFACE05 showed the promising functioning of the proposed scheme, which could significantly improve the performance of speech emotion recognition in contrast to other studies. |
---|---|
ISSN: | 0278-081X 1531-5878 |
DOI: | 10.1007/s00034-023-02315-4 |