Loading…

Speech emotion recognition using data augmentation method by cycle-generative adversarial networks

One of the obstacles in developing speech emotion recognition (SER) systems is the data scarcity problem, i.e., the lack of labeled data for training these systems. Data augmentation is an effective method for increasing the amount of training data. In this paper, we propose a cycle-generative adver...

Full description

Saved in:
Bibliographic Details
Published in:Signal, image and video processing image and video processing, 2022-10, Vol.16 (7), p.1955-1962
Main Authors: Shilandari, Arash, Marvi, Hossein, Khosravi, Hossein, Wang, Wenwu
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:One of the obstacles in developing speech emotion recognition (SER) systems is the data scarcity problem, i.e., the lack of labeled data for training these systems. Data augmentation is an effective method for increasing the amount of training data. In this paper, we propose a cycle-generative adversarial network (cycle-GAN) for data augmentation in the SER systems. For each of the five emotions considered, an adversarial network is designed to generate data that have a similar distribution to the main data in that class but have a different distribution to those of other classes. These networks are trained in an adversarial way to produce feature vectors similar to those in the training set, which are then added to the original training sets. Instead of using the common cross-entropy loss to train cycle-GANs, we use the Wasserstein divergence to mitigate the gradient vanishing problem and to generate high-quality samples. The proposed network has been applied to SER using the EMO-DB dataset. The quality of the generated data is evaluated using two classifiers based on support vector machine and deep neural network. The results showed that the recognition accuracy in unweighted average recall was about 83.33%, which is better than the baseline methods compared.
ISSN:1863-1703
1863-1711
DOI:10.1007/s11760-022-02156-9