Loading…

The importance of optimal parameter setting for pitch extraction

In this study we present a performance comparison for five pitch extraction algorithms: Auto Correlation, Cross Correlation, and Sub-Harmonic Summation (as implemented in PRAAT [Boersma and Weenick (2010)]), the Robust Algorithm for Pitch Tracking implemented in ESPS [Talkin (1995)], and SWIPE'...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of meetings on acoustics 2011-06, Vol.11 (1)
Main Authors: Keelan, Evanini, Lai, Catherine, Zechner, Klaus
Format: Article
Language:English
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue 1
container_start_page
container_title Proceedings of meetings on acoustics
container_volume 11
creator Keelan, Evanini
Lai, Catherine
Zechner, Klaus
description In this study we present a performance comparison for five pitch extraction algorithms: Auto Correlation, Cross Correlation, and Sub-Harmonic Summation (as implemented in PRAAT [Boersma and Weenick (2010)]), the Robust Algorithm for Pitch Tracking implemented in ESPS [Talkin (1995)], and SWIPE' [Camacho (2007)]. Recent research showed that SHS and SWIPE' outperformed the other algorithms on two speech databases with EGG reference values [Camacho (2007)]. That study, however, used a fixed search range of 40-800 Hz for all speakers, regardless of sex or speaker-specific pitch characteristics. In the current study, we adopt the parameter optimization strategy from De Looze and Rauzy (2009) to calculate specific pitch floor and ceiling values for each speaker. Our results show a substantial improvement in accuracy of the AC, CC, and RAPT algorithms when the optimized parameters are used (especially for the female speakers), and all five algorithms show similar performance. The gross error rate for all five algorithms ranges from 0.1% to 0.3% (N=18 098) on the FDA database [Bagshaw (1994)] and from 0.2% to 0.4% (N=11 527) on the Keele database [Plante et al. (1995)]. Our study thus highlights the importance of pre-processing the speech signal to determine optimal speaker-specific parameters for pitch extraction.
doi_str_mv 10.1121/1.3609833
format article
fullrecord <record><control><sourceid>scitation_AJDQP</sourceid><recordid>TN_cdi_scitation_primary_10_1121_1_3609833</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>poma</sourcerecordid><originalsourceid>FETCH-LOGICAL-s1403-43450e1ea54475cd53bc5ccf9db8afaf820870244f360bcc92ac0f7f3a85beed3</originalsourceid><addsrcrecordid>eNotj81KAzEURoMgWKsL3yBrYeq9k2Qms1OKWqHgpoK74c6dxEY6PyRZ6NvbYlff7nznCHGHsEIs8QFXqoLGKnUhFtioprAAn1fiOqVvgArLyizE427vZBjmKWYa2cnJy2nOYaCDnCnS4LKLMrmcw_gl_RTlHDLvpfvJkTiHabwRl54Oyd2edyk-Xp53602xfX99Wz9ti4QaVKGVNuDQkdG6Ntwb1bFh9k3fWfLkbQm2hlJrf3TumJuSGHztFVnTOderpbj_5yYOmU7P7RyPnvG3RWhPvS225171B-lbSsA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>The importance of optimal parameter setting for pitch extraction</title><source>AIP Open Access Journals</source><creator>Keelan, Evanini ; Lai, Catherine ; Zechner, Klaus</creator><creatorcontrib>Keelan, Evanini ; Lai, Catherine ; Zechner, Klaus</creatorcontrib><description>In this study we present a performance comparison for five pitch extraction algorithms: Auto Correlation, Cross Correlation, and Sub-Harmonic Summation (as implemented in PRAAT [Boersma and Weenick (2010)]), the Robust Algorithm for Pitch Tracking implemented in ESPS [Talkin (1995)], and SWIPE' [Camacho (2007)]. Recent research showed that SHS and SWIPE' outperformed the other algorithms on two speech databases with EGG reference values [Camacho (2007)]. That study, however, used a fixed search range of 40-800 Hz for all speakers, regardless of sex or speaker-specific pitch characteristics. In the current study, we adopt the parameter optimization strategy from De Looze and Rauzy (2009) to calculate specific pitch floor and ceiling values for each speaker. Our results show a substantial improvement in accuracy of the AC, CC, and RAPT algorithms when the optimized parameters are used (especially for the female speakers), and all five algorithms show similar performance. The gross error rate for all five algorithms ranges from 0.1% to 0.3% (N=18 098) on the FDA database [Bagshaw (1994)] and from 0.2% to 0.4% (N=11 527) on the Keele database [Plante et al. (1995)]. Our study thus highlights the importance of pre-processing the speech signal to determine optimal speaker-specific parameters for pitch extraction.</description><identifier>EISSN: 1939-800X</identifier><identifier>DOI: 10.1121/1.3609833</identifier><identifier>CODEN: PMARCW</identifier><language>eng</language><ispartof>Proceedings of meetings on acoustics, 2011-06, Vol.11 (1)</ispartof><rights>Acoustical Society of America</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://pubs.aip.org/poma/article-lookup/doi/10.1121/1.3609833$$EHTML$$P50$$Gscitation$$H</linktohtml><link.rule.ids>314,780,784,27890,27924,27925,76408</link.rule.ids><linktorsrc>$$Uhttp://dx.doi.org/10.1121/1.3609833$$EView_record_in_American_Institute_of_Physics$$FView_record_in_$$GAmerican_Institute_of_Physics</linktorsrc></links><search><creatorcontrib>Keelan, Evanini</creatorcontrib><creatorcontrib>Lai, Catherine</creatorcontrib><creatorcontrib>Zechner, Klaus</creatorcontrib><title>The importance of optimal parameter setting for pitch extraction</title><title>Proceedings of meetings on acoustics</title><description>In this study we present a performance comparison for five pitch extraction algorithms: Auto Correlation, Cross Correlation, and Sub-Harmonic Summation (as implemented in PRAAT [Boersma and Weenick (2010)]), the Robust Algorithm for Pitch Tracking implemented in ESPS [Talkin (1995)], and SWIPE' [Camacho (2007)]. Recent research showed that SHS and SWIPE' outperformed the other algorithms on two speech databases with EGG reference values [Camacho (2007)]. That study, however, used a fixed search range of 40-800 Hz for all speakers, regardless of sex or speaker-specific pitch characteristics. In the current study, we adopt the parameter optimization strategy from De Looze and Rauzy (2009) to calculate specific pitch floor and ceiling values for each speaker. Our results show a substantial improvement in accuracy of the AC, CC, and RAPT algorithms when the optimized parameters are used (especially for the female speakers), and all five algorithms show similar performance. The gross error rate for all five algorithms ranges from 0.1% to 0.3% (N=18 098) on the FDA database [Bagshaw (1994)] and from 0.2% to 0.4% (N=11 527) on the Keele database [Plante et al. (1995)]. Our study thus highlights the importance of pre-processing the speech signal to determine optimal speaker-specific parameters for pitch extraction.</description><issn>1939-800X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><sourceid/><recordid>eNotj81KAzEURoMgWKsL3yBrYeq9k2Qms1OKWqHgpoK74c6dxEY6PyRZ6NvbYlff7nznCHGHsEIs8QFXqoLGKnUhFtioprAAn1fiOqVvgArLyizE427vZBjmKWYa2cnJy2nOYaCDnCnS4LKLMrmcw_gl_RTlHDLvpfvJkTiHabwRl54Oyd2edyk-Xp53602xfX99Wz9ti4QaVKGVNuDQkdG6Ntwb1bFh9k3fWfLkbQm2hlJrf3TumJuSGHztFVnTOderpbj_5yYOmU7P7RyPnvG3RWhPvS225171B-lbSsA</recordid><startdate>20110622</startdate><enddate>20110622</enddate><creator>Keelan, Evanini</creator><creator>Lai, Catherine</creator><creator>Zechner, Klaus</creator><scope/></search><sort><creationdate>20110622</creationdate><title>The importance of optimal parameter setting for pitch extraction</title><author>Keelan, Evanini ; Lai, Catherine ; Zechner, Klaus</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-s1403-43450e1ea54475cd53bc5ccf9db8afaf820870244f360bcc92ac0f7f3a85beed3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Keelan, Evanini</creatorcontrib><creatorcontrib>Lai, Catherine</creatorcontrib><creatorcontrib>Zechner, Klaus</creatorcontrib><jtitle>Proceedings of meetings on acoustics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Keelan, Evanini</au><au>Lai, Catherine</au><au>Zechner, Klaus</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The importance of optimal parameter setting for pitch extraction</atitle><jtitle>Proceedings of meetings on acoustics</jtitle><date>2011-06-22</date><risdate>2011</risdate><volume>11</volume><issue>1</issue><eissn>1939-800X</eissn><coden>PMARCW</coden><abstract>In this study we present a performance comparison for five pitch extraction algorithms: Auto Correlation, Cross Correlation, and Sub-Harmonic Summation (as implemented in PRAAT [Boersma and Weenick (2010)]), the Robust Algorithm for Pitch Tracking implemented in ESPS [Talkin (1995)], and SWIPE' [Camacho (2007)]. Recent research showed that SHS and SWIPE' outperformed the other algorithms on two speech databases with EGG reference values [Camacho (2007)]. That study, however, used a fixed search range of 40-800 Hz for all speakers, regardless of sex or speaker-specific pitch characteristics. In the current study, we adopt the parameter optimization strategy from De Looze and Rauzy (2009) to calculate specific pitch floor and ceiling values for each speaker. Our results show a substantial improvement in accuracy of the AC, CC, and RAPT algorithms when the optimized parameters are used (especially for the female speakers), and all five algorithms show similar performance. The gross error rate for all five algorithms ranges from 0.1% to 0.3% (N=18 098) on the FDA database [Bagshaw (1994)] and from 0.2% to 0.4% (N=11 527) on the Keele database [Plante et al. (1995)]. Our study thus highlights the importance of pre-processing the speech signal to determine optimal speaker-specific parameters for pitch extraction.</abstract><doi>10.1121/1.3609833</doi><tpages>10</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier EISSN: 1939-800X
ispartof Proceedings of meetings on acoustics, 2011-06, Vol.11 (1)
issn 1939-800X
language eng
recordid cdi_scitation_primary_10_1121_1_3609833
source AIP Open Access Journals
title The importance of optimal parameter setting for pitch extraction
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T20%3A06%3A28IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-scitation_AJDQP&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20importance%20of%20optimal%20parameter%20setting%20for%20pitch%20extraction&rft.jtitle=Proceedings%20of%20meetings%20on%20acoustics&rft.au=Keelan,%20Evanini&rft.date=2011-06-22&rft.volume=11&rft.issue=1&rft.eissn=1939-800X&rft.coden=PMARCW&rft_id=info:doi/10.1121/1.3609833&rft_dat=%3Cscitation_AJDQP%3Epoma%3C/scitation_AJDQP%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-s1403-43450e1ea54475cd53bc5ccf9db8afaf820870244f360bcc92ac0f7f3a85beed3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true