Loading…

Speech emotion recognition based on multi‐feature and multi‐lingual fusion

A speech emotion recognition algorithm based on multi-feature and Multi-lingual fusion is proposed in order to resolve low recognition accuracy caused bylack of large speech dataset and low robustness of acoustic features in the recognition of speech emotion. First, handcrafted and deep automatic fe...

Full description

Saved in:
Bibliographic Details
Published in:Multimedia tools and applications 2022-02, Vol.81 (4), p.4897-4907
Main Authors: Wang, Chunyi, Ren, Ying, Zhang, Na, Cui, Fuwei, Luo, Shiying
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c319t-416d8ad3c97f990b6d11c761e52da064acfbac73017ae81a815518f052350c4c3
cites cdi_FETCH-LOGICAL-c319t-416d8ad3c97f990b6d11c761e52da064acfbac73017ae81a815518f052350c4c3
container_end_page 4907
container_issue 4
container_start_page 4897
container_title Multimedia tools and applications
container_volume 81
creator Wang, Chunyi
Ren, Ying
Zhang, Na
Cui, Fuwei
Luo, Shiying
description A speech emotion recognition algorithm based on multi-feature and Multi-lingual fusion is proposed in order to resolve low recognition accuracy caused bylack of large speech dataset and low robustness of acoustic features in the recognition of speech emotion. First, handcrafted and deep automatic features are extractedfrom existing data in Chinese and English speech emotions. Then, the various features are fused respectively. Finally, the fused features of different languages are fused again and trained in a classification model. Distinguishing the fused features with the unfused ones, the results manifest that the fused features significantly enhance the accuracy of speech emotion recognition algorithm. The proposedsolution is evaluated on the two Chinese corpus and two English corpus, and isshown to provide more accurate predictions compared to original solution. As a result of this study, the multi-feature and Multi-lingual fusion algorithm can significantly improve the speech emotion recognition accuracy when the dataset is small.
doi_str_mv 10.1007/s11042-021-10553-4
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2631476143</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2631476143</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-416d8ad3c97f990b6d11c761e52da064acfbac73017ae81a815518f052350c4c3</originalsourceid><addsrcrecordid>eNp9kL1OwzAUhS0EEqXwAkyRmA33-idORlTxJ1UwALPlOHZJlSbFTgY2HoFn5EkwDYKNyUfW-c6VPkJOEc4RQF1ERBCMAkOKICWnYo_MUCpOlWK4nzIvgCoJeEiOYlwDYC6ZmJH7x61z9iVzm35o-i4LzvarrtnlykRXZylsxnZoPt8_vDPDGFxmuvr3r2261WjazI8xMcfkwJs2upOfd06er6-eFrd0-XBzt7hcUsuxHKjAvC5MzW2pfFlCldeIVuXoJKsN5MJYXxmrOKAyrkBToJRYeJCMS7DC8jk5m3a3oX8dXRz0uh9Dl05qlnMUaUvw1GJTy4Y-xuC83oZmY8KbRtDf3vTkTSdveudNiwTxCYqp3K1c-Jv-h_oCT5VyFw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2631476143</pqid></control><display><type>article</type><title>Speech emotion recognition based on multi‐feature and multi‐lingual fusion</title><source>ABI/INFORM Global (ProQuest)</source><source>Springer Nature</source><creator>Wang, Chunyi ; Ren, Ying ; Zhang, Na ; Cui, Fuwei ; Luo, Shiying</creator><creatorcontrib>Wang, Chunyi ; Ren, Ying ; Zhang, Na ; Cui, Fuwei ; Luo, Shiying</creatorcontrib><description>A speech emotion recognition algorithm based on multi-feature and Multi-lingual fusion is proposed in order to resolve low recognition accuracy caused bylack of large speech dataset and low robustness of acoustic features in the recognition of speech emotion. First, handcrafted and deep automatic features are extractedfrom existing data in Chinese and English speech emotions. Then, the various features are fused respectively. Finally, the fused features of different languages are fused again and trained in a classification model. Distinguishing the fused features with the unfused ones, the results manifest that the fused features significantly enhance the accuracy of speech emotion recognition algorithm. The proposedsolution is evaluated on the two Chinese corpus and two English corpus, and isshown to provide more accurate predictions compared to original solution. As a result of this study, the multi-feature and Multi-lingual fusion algorithm can significantly improve the speech emotion recognition accuracy when the dataset is small.</description><identifier>ISSN: 1380-7501</identifier><identifier>EISSN: 1573-7721</identifier><identifier>DOI: 10.1007/s11042-021-10553-4</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>1193: Intelligent Processing of Multimedia Signals ; Accuracy ; Algorithms ; Computer Communication Networks ; Computer Science ; Data Structures and Information Theory ; Datasets ; Emotion recognition ; Emotions ; Feature extraction ; Feature recognition ; Multimedia Information Systems ; Special Purpose and Application-Based Systems ; Speech ; Speech recognition</subject><ispartof>Multimedia tools and applications, 2022-02, Vol.81 (4), p.4897-4907</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021</rights><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-416d8ad3c97f990b6d11c761e52da064acfbac73017ae81a815518f052350c4c3</citedby><cites>FETCH-LOGICAL-c319t-416d8ad3c97f990b6d11c761e52da064acfbac73017ae81a815518f052350c4c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2631476143/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2631476143?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,778,782,11671,27907,27908,36043,44346,74646</link.rule.ids></links><search><creatorcontrib>Wang, Chunyi</creatorcontrib><creatorcontrib>Ren, Ying</creatorcontrib><creatorcontrib>Zhang, Na</creatorcontrib><creatorcontrib>Cui, Fuwei</creatorcontrib><creatorcontrib>Luo, Shiying</creatorcontrib><title>Speech emotion recognition based on multi‐feature and multi‐lingual fusion</title><title>Multimedia tools and applications</title><addtitle>Multimed Tools Appl</addtitle><description>A speech emotion recognition algorithm based on multi-feature and Multi-lingual fusion is proposed in order to resolve low recognition accuracy caused bylack of large speech dataset and low robustness of acoustic features in the recognition of speech emotion. First, handcrafted and deep automatic features are extractedfrom existing data in Chinese and English speech emotions. Then, the various features are fused respectively. Finally, the fused features of different languages are fused again and trained in a classification model. Distinguishing the fused features with the unfused ones, the results manifest that the fused features significantly enhance the accuracy of speech emotion recognition algorithm. The proposedsolution is evaluated on the two Chinese corpus and two English corpus, and isshown to provide more accurate predictions compared to original solution. As a result of this study, the multi-feature and Multi-lingual fusion algorithm can significantly improve the speech emotion recognition accuracy when the dataset is small.</description><subject>1193: Intelligent Processing of Multimedia Signals</subject><subject>Accuracy</subject><subject>Algorithms</subject><subject>Computer Communication Networks</subject><subject>Computer Science</subject><subject>Data Structures and Information Theory</subject><subject>Datasets</subject><subject>Emotion recognition</subject><subject>Emotions</subject><subject>Feature extraction</subject><subject>Feature recognition</subject><subject>Multimedia Information Systems</subject><subject>Special Purpose and Application-Based Systems</subject><subject>Speech</subject><subject>Speech recognition</subject><issn>1380-7501</issn><issn>1573-7721</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>M0C</sourceid><recordid>eNp9kL1OwzAUhS0EEqXwAkyRmA33-idORlTxJ1UwALPlOHZJlSbFTgY2HoFn5EkwDYKNyUfW-c6VPkJOEc4RQF1ERBCMAkOKICWnYo_MUCpOlWK4nzIvgCoJeEiOYlwDYC6ZmJH7x61z9iVzm35o-i4LzvarrtnlykRXZylsxnZoPt8_vDPDGFxmuvr3r2261WjazI8xMcfkwJs2upOfd06er6-eFrd0-XBzt7hcUsuxHKjAvC5MzW2pfFlCldeIVuXoJKsN5MJYXxmrOKAyrkBToJRYeJCMS7DC8jk5m3a3oX8dXRz0uh9Dl05qlnMUaUvw1GJTy4Y-xuC83oZmY8KbRtDf3vTkTSdveudNiwTxCYqp3K1c-Jv-h_oCT5VyFw</recordid><startdate>20220201</startdate><enddate>20220201</enddate><creator>Wang, Chunyi</creator><creator>Ren, Ying</creator><creator>Zhang, Na</creator><creator>Cui, Fuwei</creator><creator>Luo, Shiying</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M2O</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope></search><sort><creationdate>20220201</creationdate><title>Speech emotion recognition based on multi‐feature and multi‐lingual fusion</title><author>Wang, Chunyi ; Ren, Ying ; Zhang, Na ; Cui, Fuwei ; Luo, Shiying</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-416d8ad3c97f990b6d11c761e52da064acfbac73017ae81a815518f052350c4c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>1193: Intelligent Processing of Multimedia Signals</topic><topic>Accuracy</topic><topic>Algorithms</topic><topic>Computer Communication Networks</topic><topic>Computer Science</topic><topic>Data Structures and Information Theory</topic><topic>Datasets</topic><topic>Emotion recognition</topic><topic>Emotions</topic><topic>Feature extraction</topic><topic>Feature recognition</topic><topic>Multimedia Information Systems</topic><topic>Special Purpose and Application-Based Systems</topic><topic>Speech</topic><topic>Speech recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Chunyi</creatorcontrib><creatorcontrib>Ren, Ying</creatorcontrib><creatorcontrib>Zhang, Na</creatorcontrib><creatorcontrib>Cui, Fuwei</creatorcontrib><creatorcontrib>Luo, Shiying</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI商业信息数据库</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Database‎ (1962 - current)</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global (ProQuest)</collection><collection>Computing Database</collection><collection>ProQuest Research Library</collection><collection>Research Library (Corporate)</collection><collection>ProQuest advanced technologies &amp; aerospace journals</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>One Business (ProQuest)</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Multimedia tools and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Chunyi</au><au>Ren, Ying</au><au>Zhang, Na</au><au>Cui, Fuwei</au><au>Luo, Shiying</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Speech emotion recognition based on multi‐feature and multi‐lingual fusion</atitle><jtitle>Multimedia tools and applications</jtitle><stitle>Multimed Tools Appl</stitle><date>2022-02-01</date><risdate>2022</risdate><volume>81</volume><issue>4</issue><spage>4897</spage><epage>4907</epage><pages>4897-4907</pages><issn>1380-7501</issn><eissn>1573-7721</eissn><abstract>A speech emotion recognition algorithm based on multi-feature and Multi-lingual fusion is proposed in order to resolve low recognition accuracy caused bylack of large speech dataset and low robustness of acoustic features in the recognition of speech emotion. First, handcrafted and deep automatic features are extractedfrom existing data in Chinese and English speech emotions. Then, the various features are fused respectively. Finally, the fused features of different languages are fused again and trained in a classification model. Distinguishing the fused features with the unfused ones, the results manifest that the fused features significantly enhance the accuracy of speech emotion recognition algorithm. The proposedsolution is evaluated on the two Chinese corpus and two English corpus, and isshown to provide more accurate predictions compared to original solution. As a result of this study, the multi-feature and Multi-lingual fusion algorithm can significantly improve the speech emotion recognition accuracy when the dataset is small.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11042-021-10553-4</doi><tpages>11</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1380-7501
ispartof Multimedia tools and applications, 2022-02, Vol.81 (4), p.4897-4907
issn 1380-7501
1573-7721
language eng
recordid cdi_proquest_journals_2631476143
source ABI/INFORM Global (ProQuest); Springer Nature
subjects 1193: Intelligent Processing of Multimedia Signals
Accuracy
Algorithms
Computer Communication Networks
Computer Science
Data Structures and Information Theory
Datasets
Emotion recognition
Emotions
Feature extraction
Feature recognition
Multimedia Information Systems
Special Purpose and Application-Based Systems
Speech
Speech recognition
title Speech emotion recognition based on multi‐feature and multi‐lingual fusion
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T02%3A36%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Speech%20emotion%20recognition%20based%20on%20multi%E2%80%90feature%20and%20multi%E2%80%90lingual%20fusion&rft.jtitle=Multimedia%20tools%20and%20applications&rft.au=Wang,%20Chunyi&rft.date=2022-02-01&rft.volume=81&rft.issue=4&rft.spage=4897&rft.epage=4907&rft.pages=4897-4907&rft.issn=1380-7501&rft.eissn=1573-7721&rft_id=info:doi/10.1007/s11042-021-10553-4&rft_dat=%3Cproquest_cross%3E2631476143%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c319t-416d8ad3c97f990b6d11c761e52da064acfbac73017ae81a815518f052350c4c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2631476143&rft_id=info:pmid/&rfr_iscdi=true