Loading…

"We care": Improving Code Mixed Speech Emotion Recognition in Customer-Care Conversations

Speech Emotion Recognition (SER) is the task of identifying the emotion expressed in a spoken utterance. Emotion recognition is essential in building robust conversational agents in domains such as law, healthcare, education, and customer support. Most of the studies published on SER use datasets cr...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2023-08
Main Authors: Abhishek, N V S, Bhattacharyya, Pushpak
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Abhishek, N V S
Bhattacharyya, Pushpak
description Speech Emotion Recognition (SER) is the task of identifying the emotion expressed in a spoken utterance. Emotion recognition is essential in building robust conversational agents in domains such as law, healthcare, education, and customer support. Most of the studies published on SER use datasets created by employing professional actors in a noise-free environment. In natural settings such as a customer care conversation, the audio is often noisy with speakers regularly switching between different languages as they see fit. We have worked in collaboration with a leading unicorn in the Conversational AI sector to develop Natural Speech Emotion Dataset (NSED). NSED is a natural code-mixed speech emotion dataset where each utterance in a conversation is annotated with emotion, sentiment, valence, arousal, and dominance (VAD) values. In this paper, we show that by incorporating word-level VAD value we improve on the task of SER by 2%, for negative emotions, over the baseline value for NSED. High accuracy for negative emotion recognition is essential because customers expressing negative opinions/views need to be pacified with urgency, lest complaints and dissatisfaction snowball and get out of hand. Escalation of negative opinions speedily is crucial for business interests. Our study then can be utilized to develop conversational agents which are more polite and empathetic in such situations.
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2847574691</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2847574691</sourcerecordid><originalsourceid>FETCH-proquest_journals_28475746913</originalsourceid><addsrcrecordid>eNqNjcsKgkAYhYcgSMp3-LG1YDPeajsYtWhTQbQS0T8byRmbUenxm6IHaHUOfOcyIQ5lbOWnIaUz4hrTBEFA44RGEXPI1bsglIVGbwP7ttNqFLIGriqEg3hhBacOsbxD1qpeKAlHLFUtxdcLCXwwvWpR-9xO2JocUZviQ82CTG_Fw6D70zlZbrMz3_n25Dmg6fNGDVpalNM0TKIkjNcr9l_qDTSsQX0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2847574691</pqid></control><display><type>article</type><title>"We care": Improving Code Mixed Speech Emotion Recognition in Customer-Care Conversations</title><source>Publicly Available Content Database</source><creator>Abhishek, N V S ; Bhattacharyya, Pushpak</creator><creatorcontrib>Abhishek, N V S ; Bhattacharyya, Pushpak</creatorcontrib><description>Speech Emotion Recognition (SER) is the task of identifying the emotion expressed in a spoken utterance. Emotion recognition is essential in building robust conversational agents in domains such as law, healthcare, education, and customer support. Most of the studies published on SER use datasets created by employing professional actors in a noise-free environment. In natural settings such as a customer care conversation, the audio is often noisy with speakers regularly switching between different languages as they see fit. We have worked in collaboration with a leading unicorn in the Conversational AI sector to develop Natural Speech Emotion Dataset (NSED). NSED is a natural code-mixed speech emotion dataset where each utterance in a conversation is annotated with emotion, sentiment, valence, arousal, and dominance (VAD) values. In this paper, we show that by incorporating word-level VAD value we improve on the task of SER by 2%, for negative emotions, over the baseline value for NSED. High accuracy for negative emotion recognition is essential because customers expressing negative opinions/views need to be pacified with urgency, lest complaints and dissatisfaction snowball and get out of hand. Escalation of negative opinions speedily is crucial for business interests. Our study then can be utilized to develop conversational agents which are more polite and empathetic in such situations.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Arousal ; Customers ; Datasets ; Emotion recognition ; Emotions ; Speech recognition</subject><ispartof>arXiv.org, 2023-08</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2847574691?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Abhishek, N V S</creatorcontrib><creatorcontrib>Bhattacharyya, Pushpak</creatorcontrib><title>"We care": Improving Code Mixed Speech Emotion Recognition in Customer-Care Conversations</title><title>arXiv.org</title><description>Speech Emotion Recognition (SER) is the task of identifying the emotion expressed in a spoken utterance. Emotion recognition is essential in building robust conversational agents in domains such as law, healthcare, education, and customer support. Most of the studies published on SER use datasets created by employing professional actors in a noise-free environment. In natural settings such as a customer care conversation, the audio is often noisy with speakers regularly switching between different languages as they see fit. We have worked in collaboration with a leading unicorn in the Conversational AI sector to develop Natural Speech Emotion Dataset (NSED). NSED is a natural code-mixed speech emotion dataset where each utterance in a conversation is annotated with emotion, sentiment, valence, arousal, and dominance (VAD) values. In this paper, we show that by incorporating word-level VAD value we improve on the task of SER by 2%, for negative emotions, over the baseline value for NSED. High accuracy for negative emotion recognition is essential because customers expressing negative opinions/views need to be pacified with urgency, lest complaints and dissatisfaction snowball and get out of hand. Escalation of negative opinions speedily is crucial for business interests. Our study then can be utilized to develop conversational agents which are more polite and empathetic in such situations.</description><subject>Arousal</subject><subject>Customers</subject><subject>Datasets</subject><subject>Emotion recognition</subject><subject>Emotions</subject><subject>Speech recognition</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNjcsKgkAYhYcgSMp3-LG1YDPeajsYtWhTQbQS0T8byRmbUenxm6IHaHUOfOcyIQ5lbOWnIaUz4hrTBEFA44RGEXPI1bsglIVGbwP7ttNqFLIGriqEg3hhBacOsbxD1qpeKAlHLFUtxdcLCXwwvWpR-9xO2JocUZviQ82CTG_Fw6D70zlZbrMz3_n25Dmg6fNGDVpalNM0TKIkjNcr9l_qDTSsQX0</recordid><startdate>20230806</startdate><enddate>20230806</enddate><creator>Abhishek, N V S</creator><creator>Bhattacharyya, Pushpak</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope></search><sort><creationdate>20230806</creationdate><title>"We care": Improving Code Mixed Speech Emotion Recognition in Customer-Care Conversations</title><author>Abhishek, N V S ; Bhattacharyya, Pushpak</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_28475746913</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Arousal</topic><topic>Customers</topic><topic>Datasets</topic><topic>Emotion recognition</topic><topic>Emotions</topic><topic>Speech recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Abhishek, N V S</creatorcontrib><creatorcontrib>Bhattacharyya, Pushpak</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Abhishek, N V S</au><au>Bhattacharyya, Pushpak</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>"We care": Improving Code Mixed Speech Emotion Recognition in Customer-Care Conversations</atitle><jtitle>arXiv.org</jtitle><date>2023-08-06</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Speech Emotion Recognition (SER) is the task of identifying the emotion expressed in a spoken utterance. Emotion recognition is essential in building robust conversational agents in domains such as law, healthcare, education, and customer support. Most of the studies published on SER use datasets created by employing professional actors in a noise-free environment. In natural settings such as a customer care conversation, the audio is often noisy with speakers regularly switching between different languages as they see fit. We have worked in collaboration with a leading unicorn in the Conversational AI sector to develop Natural Speech Emotion Dataset (NSED). NSED is a natural code-mixed speech emotion dataset where each utterance in a conversation is annotated with emotion, sentiment, valence, arousal, and dominance (VAD) values. In this paper, we show that by incorporating word-level VAD value we improve on the task of SER by 2%, for negative emotions, over the baseline value for NSED. High accuracy for negative emotion recognition is essential because customers expressing negative opinions/views need to be pacified with urgency, lest complaints and dissatisfaction snowball and get out of hand. Escalation of negative opinions speedily is crucial for business interests. Our study then can be utilized to develop conversational agents which are more polite and empathetic in such situations.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2023-08
issn 2331-8422
language eng
recordid cdi_proquest_journals_2847574691
source Publicly Available Content Database
subjects Arousal
Customers
Datasets
Emotion recognition
Emotions
Speech recognition
title "We care": Improving Code Mixed Speech Emotion Recognition in Customer-Care Conversations
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T07%3A03%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=%22We%20care%22:%20Improving%20Code%20Mixed%20Speech%20Emotion%20Recognition%20in%20Customer-Care%20Conversations&rft.jtitle=arXiv.org&rft.au=Abhishek,%20N%20V%20S&rft.date=2023-08-06&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2847574691%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_28475746913%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2847574691&rft_id=info:pmid/&rfr_iscdi=true