Loading…
A Feature Fusion Method Based on Extreme Learning Machine for Speech Emotion Recognition
Speech emotion recognition is important to understand users' intention in human-computer interaction. However, it is a challenging task partly because we cannot clearly know which feature and model are effective to distinguish emotions. Previous studies utilize convolutional neural network (CNN...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Speech emotion recognition is important to understand users' intention in human-computer interaction. However, it is a challenging task partly because we cannot clearly know which feature and model are effective to distinguish emotions. Previous studies utilize convolutional neural network (CNN) directly on spectrograms to extract features, and bidirectional long short term memory (BLSTM) is the state-of-the-art model. However, there are two problems of CNN-BLSTM. Firstly, it doesn't utilize heuristic features based on priori knowledge. Secondly, BLSTM has a complex structure and high complexity in training. To address the first problem, we propose a feature fusion method that combines CNN-based features and heuristic-based discriminative features which are extracted from heuristic features using deep neural network (DNN). In addition, we utilize extreme learning machine (ELM) instead of BLSTM to solve the second problem. The experiments conducted on EmoDB and our method leads to 40% relative error reduction in Fl-score compared to CNN-BLSTM. |
---|---|
ISSN: | 2379-190X |
DOI: | 10.1109/ICASSP.2018.8462219 |