Loading…

How Much Does the Dynamic F0 Curve Affect the Expression of Emotion in Utterances?

The modulation of vocal elements, such as pitch, loudness, and duration, plays a crucial role in conveying both linguistic information and the speaker’s emotional state. While acoustic features like fundamental frequency (F0) variability have been widely studied in emotional speech analysis, accurat...

Full description

Saved in:
Bibliographic Details
Published in:Applied sciences 2024-12, Vol.14 (23), p.10972
Main Author: Yoon, Tae-Jin
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c2061-41c00d12dc8187077a5688fba58744ab408cf5a9498554ee4f4235fe4b07c6a93
container_end_page
container_issue 23
container_start_page 10972
container_title Applied sciences
container_volume 14
creator Yoon, Tae-Jin
description The modulation of vocal elements, such as pitch, loudness, and duration, plays a crucial role in conveying both linguistic information and the speaker’s emotional state. While acoustic features like fundamental frequency (F0) variability have been widely studied in emotional speech analysis, accurately classifying emotion remains challenging due to the complex and dynamic nature of vocal expressions. Traditional analytical methods often oversimplify these dynamics, potentially overlooking intricate patterns indicative of specific emotions. This study examines the influences of emotion and temporal variation on dynamic F0 contours in the analytical framework, utilizing a dataset valuable for its diverse emotional expressions. However, the analysis is constrained by the limited variety of sentences employed, which may affect the generalizability of the findings to broader linguistic contexts. We utilized the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), focusing on eight distinct emotional states performed by 24 professional actors. Sonorant segments were extracted, and F0 measurements were converted into semitones relative to a 100 Hz baseline to standardize pitch variations. By employing Generalized Additive Mixed Models (GAMMs), we modeled non-linear trajectories of F0 contours over time, accounting for fixed effects (emotions) and random effects (individual speaker variability). Our analysis revealed that incorporating emotion-specific, non-linear time effects and individual speaker differences significantly improved the model’s explanatory power, ultimately explaining up to 66.5% of the variance in the F0. The inclusion of random smooths for time within speakers captured individual temporal modulation patterns, providing a more accurate representation of emotional speech dynamics. The results demonstrate that dynamic modeling of F0 contours using GAMMs enhances the accuracy of emotion classification in speech. This approach captures the nuanced pitch patterns associated with different emotions and accounts for individual variability among speakers. The findings contribute to a deeper understanding of the vocal expression of emotions and offer valuable insights for advancing speech emotion recognition systems.
doi_str_mv 10.3390/app142310972
format article
fullrecord <record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_f9881ff6fdac483bb42e40726d2ff5f7</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A819846800</galeid><doaj_id>oai_doaj_org_article_f9881ff6fdac483bb42e40726d2ff5f7</doaj_id><sourcerecordid>A819846800</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2061-41c00d12dc8187077a5688fba58744ab408cf5a9498554ee4f4235fe4b07c6a93</originalsourceid><addsrcrecordid>eNpNUU1LBDEMHURBUW_-gIJXV9NpZ9qeZFnXD1AE0XPpdBKdxZ2O7awf_966K2JySHhJHi9JURxxOBXCwJkbBi5LwcGocqvYK0HVEyG52v6X7xaHKS0gm-FCc9grHq7DB7tb-Rd2ETCx8QXZxVfvlp1nl8Bmq_iObEqEflzX5p9DxJS60LNAbL4M40_a9expHDG63mM6Pyh2yL0mPPyN-8XT5fxxdj25vb-6mU1vJ76Emk8k9wAtL1uvuVaglKtqralxlVZSukaC9lQ5I42uKokoKW9XEcoGlK-dEfvFzYa3DW5hh9gtXfyywXV2DYT4bF0cO_-KlozWnKim1nmpRdPIEiWosm5LoopU5jrecA0xvK0wjXYRVrHP8q3gUpiKS4DcdbrpenaZtOspjNH57C3mg4Ueqcv4VHOjZa3XAyebAR9DShHpTyYH-_M1-_9r4hseYobU</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3143951400</pqid></control><display><type>article</type><title>How Much Does the Dynamic F0 Curve Affect the Expression of Emotion in Utterances?</title><source>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</source><creator>Yoon, Tae-Jin</creator><creatorcontrib>Yoon, Tae-Jin</creatorcontrib><description>The modulation of vocal elements, such as pitch, loudness, and duration, plays a crucial role in conveying both linguistic information and the speaker’s emotional state. While acoustic features like fundamental frequency (F0) variability have been widely studied in emotional speech analysis, accurately classifying emotion remains challenging due to the complex and dynamic nature of vocal expressions. Traditional analytical methods often oversimplify these dynamics, potentially overlooking intricate patterns indicative of specific emotions. This study examines the influences of emotion and temporal variation on dynamic F0 contours in the analytical framework, utilizing a dataset valuable for its diverse emotional expressions. However, the analysis is constrained by the limited variety of sentences employed, which may affect the generalizability of the findings to broader linguistic contexts. We utilized the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), focusing on eight distinct emotional states performed by 24 professional actors. Sonorant segments were extracted, and F0 measurements were converted into semitones relative to a 100 Hz baseline to standardize pitch variations. By employing Generalized Additive Mixed Models (GAMMs), we modeled non-linear trajectories of F0 contours over time, accounting for fixed effects (emotions) and random effects (individual speaker variability). Our analysis revealed that incorporating emotion-specific, non-linear time effects and individual speaker differences significantly improved the model’s explanatory power, ultimately explaining up to 66.5% of the variance in the F0. The inclusion of random smooths for time within speakers captured individual temporal modulation patterns, providing a more accurate representation of emotional speech dynamics. The results demonstrate that dynamic modeling of F0 contours using GAMMs enhances the accuracy of emotion classification in speech. This approach captures the nuanced pitch patterns associated with different emotions and accounts for individual variability among speakers. The findings contribute to a deeper understanding of the vocal expression of emotions and offer valuable insights for advancing speech emotion recognition systems.</description><identifier>ISSN: 2076-3417</identifier><identifier>EISSN: 2076-3417</identifier><identifier>DOI: 10.3390/app142310972</identifier><language>eng</language><publisher>Basel: MDPI AG</publisher><subject>Accuracy ; Acoustics ; Classification ; Datasets ; Deep learning ; emotional speech recognition ; Emotions ; fundamental frequency (F0) ; generalized additive mixed models (GAMMs) ; Linguistics ; non-linear dynamics ; pitch contours ; Speech ; speech processing ; Wavelet transforms</subject><ispartof>Applied sciences, 2024-12, Vol.14 (23), p.10972</ispartof><rights>COPYRIGHT 2024 MDPI AG</rights><rights>2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c2061-41c00d12dc8187077a5688fba58744ab408cf5a9498554ee4f4235fe4b07c6a93</cites><orcidid>0000-0002-1338-4897</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/3143951400/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/3143951400?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,25753,27924,27925,37012,44590,75126</link.rule.ids></links><search><creatorcontrib>Yoon, Tae-Jin</creatorcontrib><title>How Much Does the Dynamic F0 Curve Affect the Expression of Emotion in Utterances?</title><title>Applied sciences</title><description>The modulation of vocal elements, such as pitch, loudness, and duration, plays a crucial role in conveying both linguistic information and the speaker’s emotional state. While acoustic features like fundamental frequency (F0) variability have been widely studied in emotional speech analysis, accurately classifying emotion remains challenging due to the complex and dynamic nature of vocal expressions. Traditional analytical methods often oversimplify these dynamics, potentially overlooking intricate patterns indicative of specific emotions. This study examines the influences of emotion and temporal variation on dynamic F0 contours in the analytical framework, utilizing a dataset valuable for its diverse emotional expressions. However, the analysis is constrained by the limited variety of sentences employed, which may affect the generalizability of the findings to broader linguistic contexts. We utilized the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), focusing on eight distinct emotional states performed by 24 professional actors. Sonorant segments were extracted, and F0 measurements were converted into semitones relative to a 100 Hz baseline to standardize pitch variations. By employing Generalized Additive Mixed Models (GAMMs), we modeled non-linear trajectories of F0 contours over time, accounting for fixed effects (emotions) and random effects (individual speaker variability). Our analysis revealed that incorporating emotion-specific, non-linear time effects and individual speaker differences significantly improved the model’s explanatory power, ultimately explaining up to 66.5% of the variance in the F0. The inclusion of random smooths for time within speakers captured individual temporal modulation patterns, providing a more accurate representation of emotional speech dynamics. The results demonstrate that dynamic modeling of F0 contours using GAMMs enhances the accuracy of emotion classification in speech. This approach captures the nuanced pitch patterns associated with different emotions and accounts for individual variability among speakers. The findings contribute to a deeper understanding of the vocal expression of emotions and offer valuable insights for advancing speech emotion recognition systems.</description><subject>Accuracy</subject><subject>Acoustics</subject><subject>Classification</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>emotional speech recognition</subject><subject>Emotions</subject><subject>fundamental frequency (F0)</subject><subject>generalized additive mixed models (GAMMs)</subject><subject>Linguistics</subject><subject>non-linear dynamics</subject><subject>pitch contours</subject><subject>Speech</subject><subject>speech processing</subject><subject>Wavelet transforms</subject><issn>2076-3417</issn><issn>2076-3417</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNpNUU1LBDEMHURBUW_-gIJXV9NpZ9qeZFnXD1AE0XPpdBKdxZ2O7awf_966K2JySHhJHi9JURxxOBXCwJkbBi5LwcGocqvYK0HVEyG52v6X7xaHKS0gm-FCc9grHq7DB7tb-Rd2ETCx8QXZxVfvlp1nl8Bmq_iObEqEflzX5p9DxJS60LNAbL4M40_a9expHDG63mM6Pyh2yL0mPPyN-8XT5fxxdj25vb-6mU1vJ76Emk8k9wAtL1uvuVaglKtqralxlVZSukaC9lQ5I42uKokoKW9XEcoGlK-dEfvFzYa3DW5hh9gtXfyywXV2DYT4bF0cO_-KlozWnKim1nmpRdPIEiWosm5LoopU5jrecA0xvK0wjXYRVrHP8q3gUpiKS4DcdbrpenaZtOspjNH57C3mg4Ueqcv4VHOjZa3XAyebAR9DShHpTyYH-_M1-_9r4hseYobU</recordid><startdate>20241201</startdate><enddate>20241201</enddate><creator>Yoon, Tae-Jin</creator><general>MDPI AG</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-1338-4897</orcidid></search><sort><creationdate>20241201</creationdate><title>How Much Does the Dynamic F0 Curve Affect the Expression of Emotion in Utterances?</title><author>Yoon, Tae-Jin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2061-41c00d12dc8187077a5688fba58744ab408cf5a9498554ee4f4235fe4b07c6a93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Acoustics</topic><topic>Classification</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>emotional speech recognition</topic><topic>Emotions</topic><topic>fundamental frequency (F0)</topic><topic>generalized additive mixed models (GAMMs)</topic><topic>Linguistics</topic><topic>non-linear dynamics</topic><topic>pitch contours</topic><topic>Speech</topic><topic>speech processing</topic><topic>Wavelet transforms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yoon, Tae-Jin</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>Applied sciences</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yoon, Tae-Jin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>How Much Does the Dynamic F0 Curve Affect the Expression of Emotion in Utterances?</atitle><jtitle>Applied sciences</jtitle><date>2024-12-01</date><risdate>2024</risdate><volume>14</volume><issue>23</issue><spage>10972</spage><pages>10972-</pages><issn>2076-3417</issn><eissn>2076-3417</eissn><abstract>The modulation of vocal elements, such as pitch, loudness, and duration, plays a crucial role in conveying both linguistic information and the speaker’s emotional state. While acoustic features like fundamental frequency (F0) variability have been widely studied in emotional speech analysis, accurately classifying emotion remains challenging due to the complex and dynamic nature of vocal expressions. Traditional analytical methods often oversimplify these dynamics, potentially overlooking intricate patterns indicative of specific emotions. This study examines the influences of emotion and temporal variation on dynamic F0 contours in the analytical framework, utilizing a dataset valuable for its diverse emotional expressions. However, the analysis is constrained by the limited variety of sentences employed, which may affect the generalizability of the findings to broader linguistic contexts. We utilized the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), focusing on eight distinct emotional states performed by 24 professional actors. Sonorant segments were extracted, and F0 measurements were converted into semitones relative to a 100 Hz baseline to standardize pitch variations. By employing Generalized Additive Mixed Models (GAMMs), we modeled non-linear trajectories of F0 contours over time, accounting for fixed effects (emotions) and random effects (individual speaker variability). Our analysis revealed that incorporating emotion-specific, non-linear time effects and individual speaker differences significantly improved the model’s explanatory power, ultimately explaining up to 66.5% of the variance in the F0. The inclusion of random smooths for time within speakers captured individual temporal modulation patterns, providing a more accurate representation of emotional speech dynamics. The results demonstrate that dynamic modeling of F0 contours using GAMMs enhances the accuracy of emotion classification in speech. This approach captures the nuanced pitch patterns associated with different emotions and accounts for individual variability among speakers. The findings contribute to a deeper understanding of the vocal expression of emotions and offer valuable insights for advancing speech emotion recognition systems.</abstract><cop>Basel</cop><pub>MDPI AG</pub><doi>10.3390/app142310972</doi><orcidid>https://orcid.org/0000-0002-1338-4897</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2076-3417
ispartof Applied sciences, 2024-12, Vol.14 (23), p.10972
issn 2076-3417
2076-3417
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_f9881ff6fdac483bb42e40726d2ff5f7
source Publicly Available Content Database (Proquest) (PQ_SDU_P3)
subjects Accuracy
Acoustics
Classification
Datasets
Deep learning
emotional speech recognition
Emotions
fundamental frequency (F0)
generalized additive mixed models (GAMMs)
Linguistics
non-linear dynamics
pitch contours
Speech
speech processing
Wavelet transforms
title How Much Does the Dynamic F0 Curve Affect the Expression of Emotion in Utterances?
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T23%3A02%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=How%20Much%20Does%20the%20Dynamic%20F0%20Curve%20Affect%20the%20Expression%20of%20Emotion%20in%20Utterances?&rft.jtitle=Applied%20sciences&rft.au=Yoon,%20Tae-Jin&rft.date=2024-12-01&rft.volume=14&rft.issue=23&rft.spage=10972&rft.pages=10972-&rft.issn=2076-3417&rft.eissn=2076-3417&rft_id=info:doi/10.3390/app142310972&rft_dat=%3Cgale_doaj_%3EA819846800%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c2061-41c00d12dc8187077a5688fba58744ab408cf5a9498554ee4f4235fe4b07c6a93%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3143951400&rft_id=info:pmid/&rft_galeid=A819846800&rfr_iscdi=true