Loading…

Hybrid LSTM-Transformer Model for Emotion Recognition From Speech Audio Files

Emotion is a vital component in daily human communication and it helps people understand each other. Emotion recognition plays a crucial role in developing human-computer interaction and computer-based speech emotion recognition. In a nutshell, Speech Emotion Recognition (SER) recognizes emotion sig...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE access 2022, Vol.10, p.36018-36027
Main Authors:	Andayani, Felicia, Theng, Lau Bee, Tsun, Mark Teekit, Chua, Caslon
Format:	Article
Language:	English
Subjects:	Attention mechanism Audio data Classifiers Coders Convolutional neural networks Emotion recognition Emotions Feature extraction Human communication Hybrid systems long short-term memory network Performance evaluation Signal classification Spectrogram Speech speech emotion recognition Speech recognition Task analysis transformer encoder Transformers
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Emotion is a vital component in daily human communication and it helps people understand each other. Emotion recognition plays a crucial role in developing human-computer interaction and computer-based speech emotion recognition. In a nutshell, Speech Emotion Recognition (SER) recognizes emotion signals transmitted through human speech or daily conversation where the emotions in a speech strongly depend on temporal information. Despite the fact that much existing research showed that a hybrid system performs better than traditional single classifiers used in SER, there are some limitations in each of them. As a result, this paper discussed a proposed hybrid Long Short-Term Memory (LSTM) Network and Transformer Encoder to learn the long-term dependencies in speech signals and classify emotions. Speech features are extracted with Mel Frequency Cepstral Coefficient (MFCC) and fed into the proposed hybrid LSTM-Transformer classifier. A range of performance evaluations was conducted on the proposed LSTM-Transformer model. The results indicate that it achieves a significant recognition improvement compared with existing models offered by other published works. The proposed hybrid model reached 75.62%, 85.55%, and 72.49% recognition success with the RAVDESS, Emo-DB, and language-independent datasets.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2022.3163856