Loading…
How Much Does the Dynamic F0 Curve Affect the Expression of Emotion in Utterances?
The modulation of vocal elements, such as pitch, loudness, and duration, plays a crucial role in conveying both linguistic information and the speaker’s emotional state. While acoustic features like fundamental frequency (F0) variability have been widely studied in emotional speech analysis, accurat...
Saved in:
Published in: | Applied sciences 2024-12, Vol.14 (23), p.10972 |
---|---|
Main Author: | |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c2061-41c00d12dc8187077a5688fba58744ab408cf5a9498554ee4f4235fe4b07c6a93 |
container_end_page | |
container_issue | 23 |
container_start_page | 10972 |
container_title | Applied sciences |
container_volume | 14 |
creator | Yoon, Tae-Jin |
description | The modulation of vocal elements, such as pitch, loudness, and duration, plays a crucial role in conveying both linguistic information and the speaker’s emotional state. While acoustic features like fundamental frequency (F0) variability have been widely studied in emotional speech analysis, accurately classifying emotion remains challenging due to the complex and dynamic nature of vocal expressions. Traditional analytical methods often oversimplify these dynamics, potentially overlooking intricate patterns indicative of specific emotions. This study examines the influences of emotion and temporal variation on dynamic F0 contours in the analytical framework, utilizing a dataset valuable for its diverse emotional expressions. However, the analysis is constrained by the limited variety of sentences employed, which may affect the generalizability of the findings to broader linguistic contexts. We utilized the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), focusing on eight distinct emotional states performed by 24 professional actors. Sonorant segments were extracted, and F0 measurements were converted into semitones relative to a 100 Hz baseline to standardize pitch variations. By employing Generalized Additive Mixed Models (GAMMs), we modeled non-linear trajectories of F0 contours over time, accounting for fixed effects (emotions) and random effects (individual speaker variability). Our analysis revealed that incorporating emotion-specific, non-linear time effects and individual speaker differences significantly improved the model’s explanatory power, ultimately explaining up to 66.5% of the variance in the F0. The inclusion of random smooths for time within speakers captured individual temporal modulation patterns, providing a more accurate representation of emotional speech dynamics. The results demonstrate that dynamic modeling of F0 contours using GAMMs enhances the accuracy of emotion classification in speech. This approach captures the nuanced pitch patterns associated with different emotions and accounts for individual variability among speakers. The findings contribute to a deeper understanding of the vocal expression of emotions and offer valuable insights for advancing speech emotion recognition systems. |
doi_str_mv | 10.3390/app142310972 |
format | article |
fullrecord | <record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_f9881ff6fdac483bb42e40726d2ff5f7</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A819846800</galeid><doaj_id>oai_doaj_org_article_f9881ff6fdac483bb42e40726d2ff5f7</doaj_id><sourcerecordid>A819846800</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2061-41c00d12dc8187077a5688fba58744ab408cf5a9498554ee4f4235fe4b07c6a93</originalsourceid><addsrcrecordid>eNpNUU1LBDEMHURBUW_-gIJXV9NpZ9qeZFnXD1AE0XPpdBKdxZ2O7awf_966K2JySHhJHi9JURxxOBXCwJkbBi5LwcGocqvYK0HVEyG52v6X7xaHKS0gm-FCc9grHq7DB7tb-Rd2ETCx8QXZxVfvlp1nl8Bmq_iObEqEflzX5p9DxJS60LNAbL4M40_a9expHDG63mM6Pyh2yL0mPPyN-8XT5fxxdj25vb-6mU1vJ76Emk8k9wAtL1uvuVaglKtqralxlVZSukaC9lQ5I42uKokoKW9XEcoGlK-dEfvFzYa3DW5hh9gtXfyywXV2DYT4bF0cO_-KlozWnKim1nmpRdPIEiWosm5LoopU5jrecA0xvK0wjXYRVrHP8q3gUpiKS4DcdbrpenaZtOspjNH57C3mg4Ueqcv4VHOjZa3XAyebAR9DShHpTyYH-_M1-_9r4hseYobU</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3143951400</pqid></control><display><type>article</type><title>How Much Does the Dynamic F0 Curve Affect the Expression of Emotion in Utterances?</title><source>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</source><creator>Yoon, Tae-Jin</creator><creatorcontrib>Yoon, Tae-Jin</creatorcontrib><description>The modulation of vocal elements, such as pitch, loudness, and duration, plays a crucial role in conveying both linguistic information and the speaker’s emotional state. While acoustic features like fundamental frequency (F0) variability have been widely studied in emotional speech analysis, accurately classifying emotion remains challenging due to the complex and dynamic nature of vocal expressions. Traditional analytical methods often oversimplify these dynamics, potentially overlooking intricate patterns indicative of specific emotions. This study examines the influences of emotion and temporal variation on dynamic F0 contours in the analytical framework, utilizing a dataset valuable for its diverse emotional expressions. However, the analysis is constrained by the limited variety of sentences employed, which may affect the generalizability of the findings to broader linguistic contexts. We utilized the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), focusing on eight distinct emotional states performed by 24 professional actors. Sonorant segments were extracted, and F0 measurements were converted into semitones relative to a 100 Hz baseline to standardize pitch variations. By employing Generalized Additive Mixed Models (GAMMs), we modeled non-linear trajectories of F0 contours over time, accounting for fixed effects (emotions) and random effects (individual speaker variability). Our analysis revealed that incorporating emotion-specific, non-linear time effects and individual speaker differences significantly improved the model’s explanatory power, ultimately explaining up to 66.5% of the variance in the F0. The inclusion of random smooths for time within speakers captured individual temporal modulation patterns, providing a more accurate representation of emotional speech dynamics. The results demonstrate that dynamic modeling of F0 contours using GAMMs enhances the accuracy of emotion classification in speech. This approach captures the nuanced pitch patterns associated with different emotions and accounts for individual variability among speakers. The findings contribute to a deeper understanding of the vocal expression of emotions and offer valuable insights for advancing speech emotion recognition systems.</description><identifier>ISSN: 2076-3417</identifier><identifier>EISSN: 2076-3417</identifier><identifier>DOI: 10.3390/app142310972</identifier><language>eng</language><publisher>Basel: MDPI AG</publisher><subject>Accuracy ; Acoustics ; Classification ; Datasets ; Deep learning ; emotional speech recognition ; Emotions ; fundamental frequency (F0) ; generalized additive mixed models (GAMMs) ; Linguistics ; non-linear dynamics ; pitch contours ; Speech ; speech processing ; Wavelet transforms</subject><ispartof>Applied sciences, 2024-12, Vol.14 (23), p.10972</ispartof><rights>COPYRIGHT 2024 MDPI AG</rights><rights>2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c2061-41c00d12dc8187077a5688fba58744ab408cf5a9498554ee4f4235fe4b07c6a93</cites><orcidid>0000-0002-1338-4897</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/3143951400/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/3143951400?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,25753,27924,27925,37012,44590,75126</link.rule.ids></links><search><creatorcontrib>Yoon, Tae-Jin</creatorcontrib><title>How Much Does the Dynamic F0 Curve Affect the Expression of Emotion in Utterances?</title><title>Applied sciences</title><description>The modulation of vocal elements, such as pitch, loudness, and duration, plays a crucial role in conveying both linguistic information and the speaker’s emotional state. While acoustic features like fundamental frequency (F0) variability have been widely studied in emotional speech analysis, accurately classifying emotion remains challenging due to the complex and dynamic nature of vocal expressions. Traditional analytical methods often oversimplify these dynamics, potentially overlooking intricate patterns indicative of specific emotions. This study examines the influences of emotion and temporal variation on dynamic F0 contours in the analytical framework, utilizing a dataset valuable for its diverse emotional expressions. However, the analysis is constrained by the limited variety of sentences employed, which may affect the generalizability of the findings to broader linguistic contexts. We utilized the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), focusing on eight distinct emotional states performed by 24 professional actors. Sonorant segments were extracted, and F0 measurements were converted into semitones relative to a 100 Hz baseline to standardize pitch variations. By employing Generalized Additive Mixed Models (GAMMs), we modeled non-linear trajectories of F0 contours over time, accounting for fixed effects (emotions) and random effects (individual speaker variability). Our analysis revealed that incorporating emotion-specific, non-linear time effects and individual speaker differences significantly improved the model’s explanatory power, ultimately explaining up to 66.5% of the variance in the F0. The inclusion of random smooths for time within speakers captured individual temporal modulation patterns, providing a more accurate representation of emotional speech dynamics. The results demonstrate that dynamic modeling of F0 contours using GAMMs enhances the accuracy of emotion classification in speech. This approach captures the nuanced pitch patterns associated with different emotions and accounts for individual variability among speakers. The findings contribute to a deeper understanding of the vocal expression of emotions and offer valuable insights for advancing speech emotion recognition systems.</description><subject>Accuracy</subject><subject>Acoustics</subject><subject>Classification</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>emotional speech recognition</subject><subject>Emotions</subject><subject>fundamental frequency (F0)</subject><subject>generalized additive mixed models (GAMMs)</subject><subject>Linguistics</subject><subject>non-linear dynamics</subject><subject>pitch contours</subject><subject>Speech</subject><subject>speech processing</subject><subject>Wavelet transforms</subject><issn>2076-3417</issn><issn>2076-3417</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNpNUU1LBDEMHURBUW_-gIJXV9NpZ9qeZFnXD1AE0XPpdBKdxZ2O7awf_966K2JySHhJHi9JURxxOBXCwJkbBi5LwcGocqvYK0HVEyG52v6X7xaHKS0gm-FCc9grHq7DB7tb-Rd2ETCx8QXZxVfvlp1nl8Bmq_iObEqEflzX5p9DxJS60LNAbL4M40_a9expHDG63mM6Pyh2yL0mPPyN-8XT5fxxdj25vb-6mU1vJ76Emk8k9wAtL1uvuVaglKtqralxlVZSukaC9lQ5I42uKokoKW9XEcoGlK-dEfvFzYa3DW5hh9gtXfyywXV2DYT4bF0cO_-KlozWnKim1nmpRdPIEiWosm5LoopU5jrecA0xvK0wjXYRVrHP8q3gUpiKS4DcdbrpenaZtOspjNH57C3mg4Ueqcv4VHOjZa3XAyebAR9DShHpTyYH-_M1-_9r4hseYobU</recordid><startdate>20241201</startdate><enddate>20241201</enddate><creator>Yoon, Tae-Jin</creator><general>MDPI AG</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-1338-4897</orcidid></search><sort><creationdate>20241201</creationdate><title>How Much Does the Dynamic F0 Curve Affect the Expression of Emotion in Utterances?</title><author>Yoon, Tae-Jin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2061-41c00d12dc8187077a5688fba58744ab408cf5a9498554ee4f4235fe4b07c6a93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Acoustics</topic><topic>Classification</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>emotional speech recognition</topic><topic>Emotions</topic><topic>fundamental frequency (F0)</topic><topic>generalized additive mixed models (GAMMs)</topic><topic>Linguistics</topic><topic>non-linear dynamics</topic><topic>pitch contours</topic><topic>Speech</topic><topic>speech processing</topic><topic>Wavelet transforms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yoon, Tae-Jin</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>Applied sciences</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yoon, Tae-Jin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>How Much Does the Dynamic F0 Curve Affect the Expression of Emotion in Utterances?</atitle><jtitle>Applied sciences</jtitle><date>2024-12-01</date><risdate>2024</risdate><volume>14</volume><issue>23</issue><spage>10972</spage><pages>10972-</pages><issn>2076-3417</issn><eissn>2076-3417</eissn><abstract>The modulation of vocal elements, such as pitch, loudness, and duration, plays a crucial role in conveying both linguistic information and the speaker’s emotional state. While acoustic features like fundamental frequency (F0) variability have been widely studied in emotional speech analysis, accurately classifying emotion remains challenging due to the complex and dynamic nature of vocal expressions. Traditional analytical methods often oversimplify these dynamics, potentially overlooking intricate patterns indicative of specific emotions. This study examines the influences of emotion and temporal variation on dynamic F0 contours in the analytical framework, utilizing a dataset valuable for its diverse emotional expressions. However, the analysis is constrained by the limited variety of sentences employed, which may affect the generalizability of the findings to broader linguistic contexts. We utilized the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), focusing on eight distinct emotional states performed by 24 professional actors. Sonorant segments were extracted, and F0 measurements were converted into semitones relative to a 100 Hz baseline to standardize pitch variations. By employing Generalized Additive Mixed Models (GAMMs), we modeled non-linear trajectories of F0 contours over time, accounting for fixed effects (emotions) and random effects (individual speaker variability). Our analysis revealed that incorporating emotion-specific, non-linear time effects and individual speaker differences significantly improved the model’s explanatory power, ultimately explaining up to 66.5% of the variance in the F0. The inclusion of random smooths for time within speakers captured individual temporal modulation patterns, providing a more accurate representation of emotional speech dynamics. The results demonstrate that dynamic modeling of F0 contours using GAMMs enhances the accuracy of emotion classification in speech. This approach captures the nuanced pitch patterns associated with different emotions and accounts for individual variability among speakers. The findings contribute to a deeper understanding of the vocal expression of emotions and offer valuable insights for advancing speech emotion recognition systems.</abstract><cop>Basel</cop><pub>MDPI AG</pub><doi>10.3390/app142310972</doi><orcidid>https://orcid.org/0000-0002-1338-4897</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2076-3417 |
ispartof | Applied sciences, 2024-12, Vol.14 (23), p.10972 |
issn | 2076-3417 2076-3417 |
language | eng |
recordid | cdi_doaj_primary_oai_doaj_org_article_f9881ff6fdac483bb42e40726d2ff5f7 |
source | Publicly Available Content Database (Proquest) (PQ_SDU_P3) |
subjects | Accuracy Acoustics Classification Datasets Deep learning emotional speech recognition Emotions fundamental frequency (F0) generalized additive mixed models (GAMMs) Linguistics non-linear dynamics pitch contours Speech speech processing Wavelet transforms |
title | How Much Does the Dynamic F0 Curve Affect the Expression of Emotion in Utterances? |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T23%3A02%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=How%20Much%20Does%20the%20Dynamic%20F0%20Curve%20Affect%20the%20Expression%20of%20Emotion%20in%20Utterances?&rft.jtitle=Applied%20sciences&rft.au=Yoon,%20Tae-Jin&rft.date=2024-12-01&rft.volume=14&rft.issue=23&rft.spage=10972&rft.pages=10972-&rft.issn=2076-3417&rft.eissn=2076-3417&rft_id=info:doi/10.3390/app142310972&rft_dat=%3Cgale_doaj_%3EA819846800%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c2061-41c00d12dc8187077a5688fba58744ab408cf5a9498554ee4f4235fe4b07c6a93%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3143951400&rft_id=info:pmid/&rft_galeid=A819846800&rfr_iscdi=true |