Loading…

DESCU: Dyadic emotional speech corpus and recognition system for Urdu language

Speech signal contains the emotional state of a speaker along with the message. The recognition of the emotional state of a speaker helps in determining the true meaning of a message and allows for more natural communication between humans and machines. This paper presents the design and development...

Full description

Saved in:
Bibliographic Details
Published in:Speech communication 2023-03, Vol.148, p.40-52
Main Authors: Qasim, Muhammad, Habib, Tania, Urooj, Saba, Mumtaz, Benazir
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Speech signal contains the emotional state of a speaker along with the message. The recognition of the emotional state of a speaker helps in determining the true meaning of a message and allows for more natural communication between humans and machines. This paper presents the design and development of a dyadic emotional speech corpus for the Urdu language. The corpus is developed by recording dialog scenarios for anger, happy, neutral, and sad emotions. The performance of frame-level features, utterance-level features, and spectrograms have been evaluated in this work. Emotion recognition experiments have been conducted using classifiers including Support Vector Machine, Hidden Markov Models and Convolutional Neural Networks. Experimental results show that the utterance-level features outperform the frame-level features and spectrograms. The combined feature set of cepstral, spectral, prosodic, and voice quality features performs better than the individual feature sets. The unweighted average recalls of 84.1%, 80.2%, 84.7% have been achieved for speaker-dependent and speaker-independent and text-independent emotion recognition, respectively. •Development of scenario-based dyadic emotional corpus for Urdu language.•Utterance-level features give highest accuracy for emotion recognition task.•Combined feature set gives highest emotion recognition accuracy.•Accuracy of 80.2% achieved for speaker-independent emotion recognition.
ISSN:0167-6393
1872-7182
DOI:10.1016/j.specom.2023.02.002