Loading…

Multilevel Transformer for Multimodal Emotion Recognition

Multimodal emotion recognition has attracted much attention recently. Fusing multiple modalities effectively with limited labeled data is a challenging task. Considering the success of pre-trained model and fine-grained nature of emotion expression, we think it is reasonable to take these two aspect...

Full description

Saved in:

Bibliographic Details
Main Authors:	He, Junyi, Wu, Meimei, Li, Meng, Zhu, Xiaobo, Ye, Feng
Format:	Conference Proceeding
Language:	English
Subjects:	Acoustics Bert Codes Emotion recognition fine-grained interaction highway network multi-granularity emotion recognition multilevel transformer Signal processing Speech recognition Task analysis Transformers
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Multimodal emotion recognition has attracted much attention recently. Fusing multiple modalities effectively with limited labeled data is a challenging task. Considering the success of pre-trained model and fine-grained nature of emotion expression, we think it is reasonable to take these two aspects into consideration. Unlike previous methods that mainly focus on one aspect, we introduce a novel multi-granularity framework, which combines fine-grained representation with pre-trained utterance-level representation. Inspired by Transformer TTS, we propose a multilevel transformer model to perform fine-grained multimodal emotion recognition. Specifically, we explore different methods to incorporate phoneme-level embedding with word-level embedding. To perform multi-granularity learning, we simply combine multilevel transformer model with Bert. Extensive experimental results show that multilevel transformer model outperforms previous state-of-the-art approaches on IEMOCAP dataset. Multi-granularity model achieves additional performance improvement.
ISSN:	2379-190X
DOI:	10.1109/ICASSP49357.2023.10097110