Loading…

Hierarchical convolutional neural networks with post-attention for speech emotion recognition

Speech emotion recognition (SER) is a key prerequisite for natural human–computer interaction. However, existing SER systems still face great challenges, particularly in the extraction of discriminative and high-quality emotional features. To address this challenge, this study proposes hc-former, a...

Full description

Saved in:

Bibliographic Details
Published in:	Neurocomputing (Amsterdam) 2025-01, Vol.615, p.128879, Article 128879
Main Authors:	Fan, Yonghong, Huang, Heming, Han, Henry
Format:	Article
Language:	English
Subjects:	Class-discriminative features hc-former Long-term dependence Spatiotemporal information Speech emotion recognition
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Speech emotion recognition (SER) is a key prerequisite for natural human–computer interaction. However, existing SER systems still face great challenges, particularly in the extraction of discriminative and high-quality emotional features. To address this challenge, this study proposes hc-former, a hierarchical convolutional neural network (CNN) with post-attention. Unlike traditional CNNs and recurrent neural networks (RNNs), our model adeptly extracts potent class-discriminative features that integrate spatiotemporal information and long-term dependence. The class-discriminative features extracted by hc-former, which emphasize both interclass separation and intraclass compactness, can more effectively represent different class emotions often confused with one another, leading to superior classification results. Our experimental results further indicate the exceptional performance of hc-former for SER on benchmark datasets, surpassing other peer models in terms of performance while utilizing fewer parameters. •hc-former,consists of hCNN and PA.•hCNN efficiently extracts spatiotemporal information.•PA captures vital long-term dependencies.•hc-former outperforms peer models with fewer parameters for SER task.•ER-F loss helps mitigate sample imbalance and improve SER accuracy.
ISSN:	0925-2312
DOI:	10.1016/j.neucom.2024.128879