Loading…
Speech emotion recognition based on multi‐feature and multi‐lingual fusion
A speech emotion recognition algorithm based on multi-feature and Multi-lingual fusion is proposed in order to resolve low recognition accuracy caused bylack of large speech dataset and low robustness of acoustic features in the recognition of speech emotion. First, handcrafted and deep automatic fe...
Saved in:
Published in: | Multimedia tools and applications 2022-02, Vol.81 (4), p.4897-4907 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c319t-416d8ad3c97f990b6d11c761e52da064acfbac73017ae81a815518f052350c4c3 |
---|---|
cites | cdi_FETCH-LOGICAL-c319t-416d8ad3c97f990b6d11c761e52da064acfbac73017ae81a815518f052350c4c3 |
container_end_page | 4907 |
container_issue | 4 |
container_start_page | 4897 |
container_title | Multimedia tools and applications |
container_volume | 81 |
creator | Wang, Chunyi Ren, Ying Zhang, Na Cui, Fuwei Luo, Shiying |
description | A speech emotion recognition algorithm based on multi-feature and Multi-lingual fusion is proposed in order to resolve low recognition accuracy caused bylack of large speech dataset and low robustness of acoustic features in the recognition of speech emotion. First, handcrafted and deep automatic features are extractedfrom existing data in Chinese and English speech emotions. Then, the various features are fused respectively. Finally, the fused features of different languages are fused again and trained in a classification model. Distinguishing the fused features with the unfused ones, the results manifest that the fused features significantly enhance the accuracy of speech emotion recognition algorithm. The proposedsolution is evaluated on the two Chinese corpus and two English corpus, and isshown to provide more accurate predictions compared to original solution. As a result of this study, the multi-feature and Multi-lingual fusion algorithm can significantly improve the speech emotion recognition accuracy when the dataset is small. |
doi_str_mv | 10.1007/s11042-021-10553-4 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2631476143</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2631476143</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-416d8ad3c97f990b6d11c761e52da064acfbac73017ae81a815518f052350c4c3</originalsourceid><addsrcrecordid>eNp9kL1OwzAUhS0EEqXwAkyRmA33-idORlTxJ1UwALPlOHZJlSbFTgY2HoFn5EkwDYKNyUfW-c6VPkJOEc4RQF1ERBCMAkOKICWnYo_MUCpOlWK4nzIvgCoJeEiOYlwDYC6ZmJH7x61z9iVzm35o-i4LzvarrtnlykRXZylsxnZoPt8_vDPDGFxmuvr3r2261WjazI8xMcfkwJs2upOfd06er6-eFrd0-XBzt7hcUsuxHKjAvC5MzW2pfFlCldeIVuXoJKsN5MJYXxmrOKAyrkBToJRYeJCMS7DC8jk5m3a3oX8dXRz0uh9Dl05qlnMUaUvw1GJTy4Y-xuC83oZmY8KbRtDf3vTkTSdveudNiwTxCYqp3K1c-Jv-h_oCT5VyFw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2631476143</pqid></control><display><type>article</type><title>Speech emotion recognition based on multi‐feature and multi‐lingual fusion</title><source>ABI/INFORM Global (ProQuest)</source><source>Springer Nature</source><creator>Wang, Chunyi ; Ren, Ying ; Zhang, Na ; Cui, Fuwei ; Luo, Shiying</creator><creatorcontrib>Wang, Chunyi ; Ren, Ying ; Zhang, Na ; Cui, Fuwei ; Luo, Shiying</creatorcontrib><description>A speech emotion recognition algorithm based on multi-feature and Multi-lingual fusion is proposed in order to resolve low recognition accuracy caused bylack of large speech dataset and low robustness of acoustic features in the recognition of speech emotion. First, handcrafted and deep automatic features are extractedfrom existing data in Chinese and English speech emotions. Then, the various features are fused respectively. Finally, the fused features of different languages are fused again and trained in a classification model. Distinguishing the fused features with the unfused ones, the results manifest that the fused features significantly enhance the accuracy of speech emotion recognition algorithm. The proposedsolution is evaluated on the two Chinese corpus and two English corpus, and isshown to provide more accurate predictions compared to original solution. As a result of this study, the multi-feature and Multi-lingual fusion algorithm can significantly improve the speech emotion recognition accuracy when the dataset is small.</description><identifier>ISSN: 1380-7501</identifier><identifier>EISSN: 1573-7721</identifier><identifier>DOI: 10.1007/s11042-021-10553-4</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>1193: Intelligent Processing of Multimedia Signals ; Accuracy ; Algorithms ; Computer Communication Networks ; Computer Science ; Data Structures and Information Theory ; Datasets ; Emotion recognition ; Emotions ; Feature extraction ; Feature recognition ; Multimedia Information Systems ; Special Purpose and Application-Based Systems ; Speech ; Speech recognition</subject><ispartof>Multimedia tools and applications, 2022-02, Vol.81 (4), p.4897-4907</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021</rights><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-416d8ad3c97f990b6d11c761e52da064acfbac73017ae81a815518f052350c4c3</citedby><cites>FETCH-LOGICAL-c319t-416d8ad3c97f990b6d11c761e52da064acfbac73017ae81a815518f052350c4c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2631476143/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2631476143?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,778,782,11671,27907,27908,36043,44346,74646</link.rule.ids></links><search><creatorcontrib>Wang, Chunyi</creatorcontrib><creatorcontrib>Ren, Ying</creatorcontrib><creatorcontrib>Zhang, Na</creatorcontrib><creatorcontrib>Cui, Fuwei</creatorcontrib><creatorcontrib>Luo, Shiying</creatorcontrib><title>Speech emotion recognition based on multi‐feature and multi‐lingual fusion</title><title>Multimedia tools and applications</title><addtitle>Multimed Tools Appl</addtitle><description>A speech emotion recognition algorithm based on multi-feature and Multi-lingual fusion is proposed in order to resolve low recognition accuracy caused bylack of large speech dataset and low robustness of acoustic features in the recognition of speech emotion. First, handcrafted and deep automatic features are extractedfrom existing data in Chinese and English speech emotions. Then, the various features are fused respectively. Finally, the fused features of different languages are fused again and trained in a classification model. Distinguishing the fused features with the unfused ones, the results manifest that the fused features significantly enhance the accuracy of speech emotion recognition algorithm. The proposedsolution is evaluated on the two Chinese corpus and two English corpus, and isshown to provide more accurate predictions compared to original solution. As a result of this study, the multi-feature and Multi-lingual fusion algorithm can significantly improve the speech emotion recognition accuracy when the dataset is small.</description><subject>1193: Intelligent Processing of Multimedia Signals</subject><subject>Accuracy</subject><subject>Algorithms</subject><subject>Computer Communication Networks</subject><subject>Computer Science</subject><subject>Data Structures and Information Theory</subject><subject>Datasets</subject><subject>Emotion recognition</subject><subject>Emotions</subject><subject>Feature extraction</subject><subject>Feature recognition</subject><subject>Multimedia Information Systems</subject><subject>Special Purpose and Application-Based Systems</subject><subject>Speech</subject><subject>Speech recognition</subject><issn>1380-7501</issn><issn>1573-7721</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>M0C</sourceid><recordid>eNp9kL1OwzAUhS0EEqXwAkyRmA33-idORlTxJ1UwALPlOHZJlSbFTgY2HoFn5EkwDYKNyUfW-c6VPkJOEc4RQF1ERBCMAkOKICWnYo_MUCpOlWK4nzIvgCoJeEiOYlwDYC6ZmJH7x61z9iVzm35o-i4LzvarrtnlykRXZylsxnZoPt8_vDPDGFxmuvr3r2261WjazI8xMcfkwJs2upOfd06er6-eFrd0-XBzt7hcUsuxHKjAvC5MzW2pfFlCldeIVuXoJKsN5MJYXxmrOKAyrkBToJRYeJCMS7DC8jk5m3a3oX8dXRz0uh9Dl05qlnMUaUvw1GJTy4Y-xuC83oZmY8KbRtDf3vTkTSdveudNiwTxCYqp3K1c-Jv-h_oCT5VyFw</recordid><startdate>20220201</startdate><enddate>20220201</enddate><creator>Wang, Chunyi</creator><creator>Ren, Ying</creator><creator>Zhang, Na</creator><creator>Cui, Fuwei</creator><creator>Luo, Shiying</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M2O</scope><scope>MBDVC</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope></search><sort><creationdate>20220201</creationdate><title>Speech emotion recognition based on multi‐feature and multi‐lingual fusion</title><author>Wang, Chunyi ; Ren, Ying ; Zhang, Na ; Cui, Fuwei ; Luo, Shiying</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-416d8ad3c97f990b6d11c761e52da064acfbac73017ae81a815518f052350c4c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>1193: Intelligent Processing of Multimedia Signals</topic><topic>Accuracy</topic><topic>Algorithms</topic><topic>Computer Communication Networks</topic><topic>Computer Science</topic><topic>Data Structures and Information Theory</topic><topic>Datasets</topic><topic>Emotion recognition</topic><topic>Emotions</topic><topic>Feature extraction</topic><topic>Feature recognition</topic><topic>Multimedia Information Systems</topic><topic>Special Purpose and Application-Based Systems</topic><topic>Speech</topic><topic>Speech recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Chunyi</creatorcontrib><creatorcontrib>Ren, Ying</creatorcontrib><creatorcontrib>Zhang, Na</creatorcontrib><creatorcontrib>Cui, Fuwei</creatorcontrib><creatorcontrib>Luo, Shiying</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI商业信息数据库</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Database (1962 - current)</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global (ProQuest)</collection><collection>Computing Database</collection><collection>ProQuest Research Library</collection><collection>Research Library (Corporate)</collection><collection>ProQuest advanced technologies & aerospace journals</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>One Business (ProQuest)</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Multimedia tools and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Chunyi</au><au>Ren, Ying</au><au>Zhang, Na</au><au>Cui, Fuwei</au><au>Luo, Shiying</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Speech emotion recognition based on multi‐feature and multi‐lingual fusion</atitle><jtitle>Multimedia tools and applications</jtitle><stitle>Multimed Tools Appl</stitle><date>2022-02-01</date><risdate>2022</risdate><volume>81</volume><issue>4</issue><spage>4897</spage><epage>4907</epage><pages>4897-4907</pages><issn>1380-7501</issn><eissn>1573-7721</eissn><abstract>A speech emotion recognition algorithm based on multi-feature and Multi-lingual fusion is proposed in order to resolve low recognition accuracy caused bylack of large speech dataset and low robustness of acoustic features in the recognition of speech emotion. First, handcrafted and deep automatic features are extractedfrom existing data in Chinese and English speech emotions. Then, the various features are fused respectively. Finally, the fused features of different languages are fused again and trained in a classification model. Distinguishing the fused features with the unfused ones, the results manifest that the fused features significantly enhance the accuracy of speech emotion recognition algorithm. The proposedsolution is evaluated on the two Chinese corpus and two English corpus, and isshown to provide more accurate predictions compared to original solution. As a result of this study, the multi-feature and Multi-lingual fusion algorithm can significantly improve the speech emotion recognition accuracy when the dataset is small.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11042-021-10553-4</doi><tpages>11</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1380-7501 |
ispartof | Multimedia tools and applications, 2022-02, Vol.81 (4), p.4897-4907 |
issn | 1380-7501 1573-7721 |
language | eng |
recordid | cdi_proquest_journals_2631476143 |
source | ABI/INFORM Global (ProQuest); Springer Nature |
subjects | 1193: Intelligent Processing of Multimedia Signals Accuracy Algorithms Computer Communication Networks Computer Science Data Structures and Information Theory Datasets Emotion recognition Emotions Feature extraction Feature recognition Multimedia Information Systems Special Purpose and Application-Based Systems Speech Speech recognition |
title | Speech emotion recognition based on multi‐feature and multi‐lingual fusion |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-17T02%3A36%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Speech%20emotion%20recognition%20based%20on%20multi%E2%80%90feature%20and%20multi%E2%80%90lingual%20fusion&rft.jtitle=Multimedia%20tools%20and%20applications&rft.au=Wang,%20Chunyi&rft.date=2022-02-01&rft.volume=81&rft.issue=4&rft.spage=4897&rft.epage=4907&rft.pages=4897-4907&rft.issn=1380-7501&rft.eissn=1573-7721&rft_id=info:doi/10.1007/s11042-021-10553-4&rft_dat=%3Cproquest_cross%3E2631476143%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c319t-416d8ad3c97f990b6d11c761e52da064acfbac73017ae81a815518f052350c4c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2631476143&rft_id=info:pmid/&rfr_iscdi=true |