Loading…

MGAT: Multi-Granularity Attention Based Transformers for Multi-Modal Emotion Recognition

Multi-modal emotion recognition is crucial for human-computer interaction. Many existing algorithms attempt to achieve multi-modal interactions through a cross-attention mechanism. Due to the problems of noise introduction and heavy computation in the original attention mechanism, window attention h...

Full description

Saved in:

Bibliographic Details
Main Authors:	Fan, Weiquan, Xing, Xiaofen, Cai, Bolun, Xu, Xiangmin
Format:	Conference Proceeding
Language:	English
Subjects:	Emotion recognition Human computer interaction multi-granularity attention multi-modal emotion recognition Signal processing Signal processing algorithms Speech recognition Transformers
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	5
container_issue
container_start_page	1
container_title
container_volume
creator	Fan, Weiquan Xing, Xiaofen Cai, Bolun Xu, Xiangmin
description	Multi-modal emotion recognition is crucial for human-computer interaction. Many existing algorithms attempt to achieve multi-modal interactions through a cross-attention mechanism. Due to the problems of noise introduction and heavy computation in the original attention mechanism, window attention has become a new trend. However, emotions are presented asynchronously between different modalities, which makes it difficult to interact with emotional information between windows. Furthermore, multi-modal data are temporally misaligned, so single fixed window size is hard to describe cross-modal information. In this paper, we put these two issues into a unified framework and propose the multi-granularity attention based Transformers (MGAT). It addresses the emotional asynchrony and modality misalignment issues through a multi-granularity attention mechanism. Experimental results confirm the effectiveness of our method and the state-of-the-art performance is achieved on IEMOCAP.
doi_str_mv	10.1109/ICASSP49357.2023.10095855
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10095855</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10095855</ieee_id><sourcerecordid>10095855</sourcerecordid><originalsourceid>FETCH-LOGICAL-i1705-8bc0b044b754c107a9b3823deafe445e47f859f73e59da18c20be882cd09e2d3</originalsourceid><addsrcrecordid>eNo1UMlOwzAUNEhItIU_4BA-IOF5wza3UJWC1AhEc-itcuIXZJQF2emhf0-AcpqRZtFoCLmlkFEK5u5lmW-3b8JwqTIGjGcUwEgt5RmZU8U0vedMqXMyY1yZlBrYXZJ5jJ8AoJXQM7Ir1nn5kBSHdvTpOtj-0Nrgx2OSjyP2ox_65NFGdEk5abEZQochJhOeIsXgbJusuuHX-o718NH7H35FLhrbRrw-4YKUT6ty-ZxuXtfT6k3qqQKZ6qqGCoSolBQ1BWVNxTXjDm2DQkgUqtHSNIqjNM5SXTOoUGtWOzDIHF-Qm79aj4j7r-A7G477_xf4N6gkU3M</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>MGAT: Multi-Granularity Attention Based Transformers for Multi-Modal Emotion Recognition</title><source>IEEE Xplore All Conference Series</source><creator>Fan, Weiquan ; Xing, Xiaofen ; Cai, Bolun ; Xu, Xiangmin</creator><creatorcontrib>Fan, Weiquan ; Xing, Xiaofen ; Cai, Bolun ; Xu, Xiangmin</creatorcontrib><description>Multi-modal emotion recognition is crucial for human-computer interaction. Many existing algorithms attempt to achieve multi-modal interactions through a cross-attention mechanism. Due to the problems of noise introduction and heavy computation in the original attention mechanism, window attention has become a new trend. However, emotions are presented asynchronously between different modalities, which makes it difficult to interact with emotional information between windows. Furthermore, multi-modal data are temporally misaligned, so single fixed window size is hard to describe cross-modal information. In this paper, we put these two issues into a unified framework and propose the multi-granularity attention based Transformers (MGAT). It addresses the emotional asynchrony and modality misalignment issues through a multi-granularity attention mechanism. Experimental results confirm the effectiveness of our method and the state-of-the-art performance is achieved on IEMOCAP.</description><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 1728163277</identifier><identifier>EISBN: 9781728163277</identifier><identifier>DOI: 10.1109/ICASSP49357.2023.10095855</identifier><language>eng</language><publisher>IEEE</publisher><subject>Emotion recognition ; Human computer interaction ; multi-granularity attention ; multi-modal emotion recognition ; Signal processing ; Signal processing algorithms ; Speech recognition ; Transformers</subject><ispartof>ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, p.1-5</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10095855$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,4047,4048,23928,23929,25138,27923,54553,54930</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10095855$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Fan, Weiquan</creatorcontrib><creatorcontrib>Xing, Xiaofen</creatorcontrib><creatorcontrib>Cai, Bolun</creatorcontrib><creatorcontrib>Xu, Xiangmin</creatorcontrib><title>MGAT: Multi-Granularity Attention Based Transformers for Multi-Modal Emotion Recognition</title><title>ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</title><addtitle>ICASSP</addtitle><description>Multi-modal emotion recognition is crucial for human-computer interaction. Many existing algorithms attempt to achieve multi-modal interactions through a cross-attention mechanism. Due to the problems of noise introduction and heavy computation in the original attention mechanism, window attention has become a new trend. However, emotions are presented asynchronously between different modalities, which makes it difficult to interact with emotional information between windows. Furthermore, multi-modal data are temporally misaligned, so single fixed window size is hard to describe cross-modal information. In this paper, we put these two issues into a unified framework and propose the multi-granularity attention based Transformers (MGAT). It addresses the emotional asynchrony and modality misalignment issues through a multi-granularity attention mechanism. Experimental results confirm the effectiveness of our method and the state-of-the-art performance is achieved on IEMOCAP.</description><subject>Emotion recognition</subject><subject>Human computer interaction</subject><subject>multi-granularity attention</subject><subject>multi-modal emotion recognition</subject><subject>Signal processing</subject><subject>Signal processing algorithms</subject><subject>Speech recognition</subject><subject>Transformers</subject><issn>2379-190X</issn><isbn>1728163277</isbn><isbn>9781728163277</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2023</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNo1UMlOwzAUNEhItIU_4BA-IOF5wza3UJWC1AhEc-itcuIXZJQF2emhf0-AcpqRZtFoCLmlkFEK5u5lmW-3b8JwqTIGjGcUwEgt5RmZU8U0vedMqXMyY1yZlBrYXZJ5jJ8AoJXQM7Ir1nn5kBSHdvTpOtj-0Nrgx2OSjyP2ox_65NFGdEk5abEZQochJhOeIsXgbJusuuHX-o718NH7H35FLhrbRrw-4YKUT6ty-ZxuXtfT6k3qqQKZ6qqGCoSolBQ1BWVNxTXjDm2DQkgUqtHSNIqjNM5SXTOoUGtWOzDIHF-Qm79aj4j7r-A7G477_xf4N6gkU3M</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Fan, Weiquan</creator><creator>Xing, Xiaofen</creator><creator>Cai, Bolun</creator><creator>Xu, Xiangmin</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>2023</creationdate><title>MGAT: Multi-Granularity Attention Based Transformers for Multi-Modal Emotion Recognition</title><author>Fan, Weiquan ; Xing, Xiaofen ; Cai, Bolun ; Xu, Xiangmin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i1705-8bc0b044b754c107a9b3823deafe445e47f859f73e59da18c20be882cd09e2d3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Emotion recognition</topic><topic>Human computer interaction</topic><topic>multi-granularity attention</topic><topic>multi-modal emotion recognition</topic><topic>Signal processing</topic><topic>Signal processing algorithms</topic><topic>Speech recognition</topic><topic>Transformers</topic><toplevel>online_resources</toplevel><creatorcontrib>Fan, Weiquan</creatorcontrib><creatorcontrib>Xing, Xiaofen</creatorcontrib><creatorcontrib>Cai, Bolun</creatorcontrib><creatorcontrib>Xu, Xiangmin</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Fan, Weiquan</au><au>Xing, Xiaofen</au><au>Cai, Bolun</au><au>Xu, Xiangmin</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>MGAT: Multi-Granularity Attention Based Transformers for Multi-Modal Emotion Recognition</atitle><btitle>ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</btitle><stitle>ICASSP</stitle><date>2023</date><risdate>2023</risdate><spage>1</spage><epage>5</epage><pages>1-5</pages><eissn>2379-190X</eissn><eisbn>1728163277</eisbn><eisbn>9781728163277</eisbn><abstract>Multi-modal emotion recognition is crucial for human-computer interaction. Many existing algorithms attempt to achieve multi-modal interactions through a cross-attention mechanism. Due to the problems of noise introduction and heavy computation in the original attention mechanism, window attention has become a new trend. However, emotions are presented asynchronously between different modalities, which makes it difficult to interact with emotional information between windows. Furthermore, multi-modal data are temporally misaligned, so single fixed window size is hard to describe cross-modal information. In this paper, we put these two issues into a unified framework and propose the multi-granularity attention based Transformers (MGAT). It addresses the emotional asynchrony and modality misalignment issues through a multi-granularity attention mechanism. Experimental results confirm the effectiveness of our method and the state-of-the-art performance is achieved on IEMOCAP.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP49357.2023.10095855</doi><tpages>5</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	EISSN: 2379-190X
ispartof	ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, p.1-5
issn	2379-190X
language	eng
recordid	cdi_ieee_primary_10095855
source	IEEE Xplore All Conference Series
subjects	Emotion recognition Human computer interaction multi-granularity attention multi-modal emotion recognition Signal processing Signal processing algorithms Speech recognition Transformers
title	MGAT: Multi-Granularity Attention Based Transformers for Multi-Modal Emotion Recognition
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T13%3A18%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=MGAT:%20Multi-Granularity%20Attention%20Based%20Transformers%20for%20Multi-Modal%20Emotion%20Recognition&rft.btitle=ICASSP%202023%20-%202023%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing%20(ICASSP)&rft.au=Fan,%20Weiquan&rft.date=2023&rft.spage=1&rft.epage=5&rft.pages=1-5&rft.eissn=2379-190X&rft_id=info:doi/10.1109/ICASSP49357.2023.10095855&rft.eisbn=1728163277&rft.eisbn_list=9781728163277&rft_dat=%3Cieee_CHZPO%3E10095855%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i1705-8bc0b044b754c107a9b3823deafe445e47f859f73e59da18c20be882cd09e2d3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10095855&rfr_iscdi=true