Loading…

MGAT: Multi-Granularity Attention Based Transformers for Multi-Modal Emotion Recognition

Multi-modal emotion recognition is crucial for human-computer interaction. Many existing algorithms attempt to achieve multi-modal interactions through a cross-attention mechanism. Due to the problems of noise introduction and heavy computation in the original attention mechanism, window attention h...

Full description

Saved in:

Bibliographic Details
Main Authors:	Fan, Weiquan, Xing, Xiaofen, Cai, Bolun, Xu, Xiangmin
Format:	Conference Proceeding
Language:	English
Subjects:	Emotion recognition Human computer interaction multi-granularity attention multi-modal emotion recognition Signal processing Signal processing algorithms Speech recognition Transformers
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Multi-modal emotion recognition is crucial for human-computer interaction. Many existing algorithms attempt to achieve multi-modal interactions through a cross-attention mechanism. Due to the problems of noise introduction and heavy computation in the original attention mechanism, window attention has become a new trend. However, emotions are presented asynchronously between different modalities, which makes it difficult to interact with emotional information between windows. Furthermore, multi-modal data are temporally misaligned, so single fixed window size is hard to describe cross-modal information. In this paper, we put these two issues into a unified framework and propose the multi-granularity attention based Transformers (MGAT). It addresses the emotional asynchrony and modality misalignment issues through a multi-granularity attention mechanism. Experimental results confirm the effectiveness of our method and the state-of-the-art performance is achieved on IEMOCAP.
ISSN:	2379-190X
DOI:	10.1109/ICASSP49357.2023.10095855