Loading…

Emotion Recognition using Speech Data with Convolutional Neural Network

Abstract—Identifying emotion from speech has a wide range of applications and has drawn special interests in research to improve the human-computer interaction experience. Traditional machine learning approaches usually face the challenge of selecting the optimal feature set for each application. De...

Full description

Saved in:
Bibliographic Details
Main Authors: Pham, Hoang Minh, Noori, Farzan Majeed, Tørresen, Jim
Format: Book
Language:Norwegian
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract—Identifying emotion from speech has a wide range of applications and has drawn special interests in research to improve the human-computer interaction experience. Traditional machine learning approaches usually face the challenge of selecting the optimal feature set for each application. Deep learning, on the other hand, allows end-to-end development of the models and inherent feature extraction. In this study, we evaluate the performance of Convolutional Neural Network on different kinds of spectral features of acoustic signal collections, from two popular open databases Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) and Berlin Database of Emotional Speech (EmoDB). Two-to-eight classes of emotions (RAVDESS) and two-to-seven classes of emotions (EmoDB) are identified by the deep learning model. The results, in terms of unweighted average recall, are 0.888 (two classes) and 0.694 (eight classes) for the RAVDESS dataset. The corresponding results for the EmoDB dataset are 0.993 (two classes) and 0.764 (seven classes)