Loading…

Hybrid BBO_PSO and higher order spectral features for emotion and stress recognition from natural speech

[Display omitted] •We proposed higher order spectral based Bispectral and Bicoherence features for multi-class emotion/stress recognition from speech signal.•Utterances from three speech emotional databases namely BES, SAVEE and SUSAS have been used in this work.•Multi-cluster feature selection, Hyb...

Full description

Saved in:
Bibliographic Details
Published in:Applied soft computing 2017-07, Vol.56, p.217-232
Main Authors: C.K., Yogesh, Hariharan, M., Ngadiran, Ruzelita, Adom, A.H., Yaacob, Sazali, Polat, Kemal
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:[Display omitted] •We proposed higher order spectral based Bispectral and Bicoherence features for multi-class emotion/stress recognition from speech signal.•Utterances from three speech emotional databases namely BES, SAVEE and SUSAS have been used in this work.•Multi-cluster feature selection, Hybrid Bio-geographical based optimization and particle swarm optimization (HBBO_PSO) are used for feature selection.•Experiment results show the effectiveness and efficiency of the proposed method by yielding higher emotion/stress recognition rates. The aim of the present study is to select a set of higher order spectral features for emotion/stress recognition system. 50 Bispectral (28 features) and Bicoherence (22 features) based higher order spectral features were extracted from speech signal and its glottal waveform. These features were combined with Inter-Speech 2010 features to further improve the recognition rates. Feature subset selection (FSS) was carried out in this proposed work with the objective of maximizing emotion recognition rate for subject independent with minimum features. The FSS contains two stages: Multi-cluster feature selection was adopted in Stage 1 to reduce feature space and identify relevant feature subset from Interspeech 2010 features. In Stage 2, Biogeography based optimization (BBO), Particle swarm optimization (PSO) and proposed BBO_PSO Hybrid optimization were performed to further reduce the dimension of feature space and identify the most relevant feature subset, which has higher discrimination ability to distinguish different emotional states. The proposed method was tested in three different databases: Berlin emotional speech database (BES), Surrey audio-visual expressed emotion database (SAVEE) and Speech under simulated and actual stress (SUSAS) simulated domain. The proposed feature set was evaluated with subject independent (SI), subject dependent (SD), gender dependent male (GD-male), gender dependent female (GD-female), text independent pairwise speech (TIDPS), and text independent multi-style speech (TIDMSS) experiments by using SVM and ELM classifiers. From the results obtained, it is evident that the proposed method attained accuracies of 93.25% (SI), 100% (SD), 93.75% (GD-male), and 97.58% (GD-female) for BES; 62.38% (SI) and 76.19% (SD) for SAVEE; and 90.09% (TIDMSS), 97.04% (TIDPS − Angry vs. Neutral), 98.89% (TIDPS − Lombard vs. Neutral), 99.07% (TIDPS − Loud vs. Neutral) for SUSAS.
ISSN:1568-4946
1872-9681
DOI:10.1016/j.asoc.2017.03.013