Loading…

A 3D Tensor Representation of Speech and 3D Convolutional Neural Network for Emotion Recognition

Speech emotion recognition (SER) is one of the most important approaches for emotional human–computer interaction. Hence, in recent years, SER has received a great deal of attention from many researchers and has become one of the most interesting and challenging topics. In most SER systems, the extr...

Full description

Saved in:
Bibliographic Details
Published in:Circuits, systems, and signal processing systems, and signal processing, 2023-07, Vol.42 (7), p.4271-4291
Main Authors: Falahzadeh, Mohammad Reza, Farokhi, Fardad, Harimi, Ali, Sabbaghi-Nadooshan, Reza
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Speech emotion recognition (SER) is one of the most important approaches for emotional human–computer interaction. Hence, in recent years, SER has received a great deal of attention from many researchers and has become one of the most interesting and challenging topics. In most SER systems, the extraction of ineffective features is the most challenging problem. Thus, in this paper, a novel emotion recognition system was proposed based on the reconstructed phase space of speech. In this method, the three-dimensional reconstructed phase space of the speech signals was calculated. Consequently, emotion-related patterns formed in this space were converted into 3D tensors. Then, a three-dimensional convolutional neural network was employed to analyse the patterns and classify the corresponding emotions. Results of the experiments on the two public datasets including Berlin EMO-DB and eNTERFACE05 showed the promising functioning of the proposed scheme, which could significantly improve the performance of speech emotion recognition in contrast to other studies.
ISSN:0278-081X
1531-5878
DOI:10.1007/s00034-023-02315-4