Loading…

Tracking formants in spectrograms and its application in speaker verification

Formants are the most visible features in spectrograms and they also hold the most valuable speech information. Traditionally, formant tracks are found by first finding formant points in individual frames, then the formants points in neighboring frames are joined together to form tracks. In this pap...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jia-Guu Leu, Liang-tsair Geeng, Chang En Pu, Jyh-Bin Shiau
Format:	Conference Proceeding
Language:	English
Subjects:	Accuracy Cities and towns Forensics formant Roads Smoothing methods speaker verification Spectrogram Speech text sensitive tracking
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	89
container_issue
container_start_page	83
container_title
container_volume
creator	Jia-Guu Leu Liang-tsair Geeng Chang En Pu Jyh-Bin Shiau
description	Formants are the most visible features in spectrograms and they also hold the most valuable speech information. Traditionally, formant tracks are found by first finding formant points in individual frames, then the formants points in neighboring frames are joined together to form tracks. In this paper we present a formant tracking approach based on image processing techniques. Our approach is to first find the running directions of the formants in a spectrogram. Then we perform smoothing on the spectrogram along the directions of the formants to produce formants that are more continuous and stable. Then we perform ridge detection to find formant track candidates in the spectrogram. After removing tracks that are too short or too weak, we fit the remaining tracks with 2 nd degree polynomial curves to extract formants that are both smooth and continuous. Besides extracting thin formant tracks, we also extracted formant tracks with width. These thick formants are able to indication not only the locations of the formants but also the width of the formants. Using the voices of 70 people, we conducted experiments to test the effectiveness of the thin formants and the thick formants when they are used in speaker verification. Using only one sentence (6 to 10 words, 3 seconds in length) for comparison, the thin formants and the thick formants are able to achieve 88.3% and 93.8% of accuracy in speaker verification, respectively. When the number of sentences for comparison increased to seven, the accuracy rate improved to 93.8% and 98.7%, respectively.
doi_str_mv	10.1109/CCST.2012.6393541
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6393541</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6393541</ieee_id><sourcerecordid>6393541</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-6480a6d47b6eeec033cf06c0f05d4475460cb0a7240484a3d504a7914de87833</originalsourceid><addsrcrecordid>eNotkMtKxDAUhuMNrGMfQNz0BVpPkpOkWUpxVBhxYffDmTQd4kwvpEXw7S3Y1Q_ff1n8jD1wKDgH-1RVX3UhgItCSysV8guWWlNy1EYKRKsvWSK4kjkYFFfsbjUUmGuWcDA818qIW5ZO0zcALJvaapGwjzqSO4X-mLVD7Kifpyz02TR6N8fhGKmbMuqbLCycxvEcHM1h6NcMnXzMfnwM7crv2U1L58mnq25YvX2pq7d89_n6Xj3v8sCNmnONJZBu0By0996BlK4F7aAF1SAahRrcAcgIBCyRZKMAyViOjS9NKeWGPf7PhqW-H2PoKP7u12PkH8HPUos</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Tracking formants in spectrograms and its application in speaker verification</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Jia-Guu Leu ; Liang-tsair Geeng ; Chang En Pu ; Jyh-Bin Shiau</creator><creatorcontrib>Jia-Guu Leu ; Liang-tsair Geeng ; Chang En Pu ; Jyh-Bin Shiau</creatorcontrib><description>Formants are the most visible features in spectrograms and they also hold the most valuable speech information. Traditionally, formant tracks are found by first finding formant points in individual frames, then the formants points in neighboring frames are joined together to form tracks. In this paper we present a formant tracking approach based on image processing techniques. Our approach is to first find the running directions of the formants in a spectrogram. Then we perform smoothing on the spectrogram along the directions of the formants to produce formants that are more continuous and stable. Then we perform ridge detection to find formant track candidates in the spectrogram. After removing tracks that are too short or too weak, we fit the remaining tracks with 2 nd degree polynomial curves to extract formants that are both smooth and continuous. Besides extracting thin formant tracks, we also extracted formant tracks with width. These thick formants are able to indication not only the locations of the formants but also the width of the formants. Using the voices of 70 people, we conducted experiments to test the effectiveness of the thin formants and the thick formants when they are used in speaker verification. Using only one sentence (6 to 10 words, 3 seconds in length) for comparison, the thin formants and the thick formants are able to achieve 88.3% and 93.8% of accuracy in speaker verification, respectively. When the number of sentences for comparison increased to seven, the accuracy rate improved to 93.8% and 98.7%, respectively.</description><identifier>ISSN: 1071-6572</identifier><identifier>ISBN: 1467324507</identifier><identifier>ISBN: 9781467324502</identifier><identifier>EISSN: 2153-0742</identifier><identifier>EISBN: 9781467324496</identifier><identifier>EISBN: 1467324515</identifier><identifier>EISBN: 1467324493</identifier><identifier>EISBN: 9781467324519</identifier><identifier>DOI: 10.1109/CCST.2012.6393541</identifier><language>eng</language><publisher>IEEE</publisher><subject>Accuracy ; Cities and towns ; Forensics ; formant ; Roads ; Smoothing methods ; speaker verification ; Spectrogram ; Speech ; text sensitive ; tracking</subject><ispartof>2012 IEEE International Carnahan Conference on Security Technology (ICCST), 2012, p.83-89</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6393541$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27902,54530,54895,54907</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6393541$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Jia-Guu Leu</creatorcontrib><creatorcontrib>Liang-tsair Geeng</creatorcontrib><creatorcontrib>Chang En Pu</creatorcontrib><creatorcontrib>Jyh-Bin Shiau</creatorcontrib><title>Tracking formants in spectrograms and its application in speaker verification</title><title>2012 IEEE International Carnahan Conference on Security Technology (ICCST)</title><addtitle>CCST</addtitle><description>Formants are the most visible features in spectrograms and they also hold the most valuable speech information. Traditionally, formant tracks are found by first finding formant points in individual frames, then the formants points in neighboring frames are joined together to form tracks. In this paper we present a formant tracking approach based on image processing techniques. Our approach is to first find the running directions of the formants in a spectrogram. Then we perform smoothing on the spectrogram along the directions of the formants to produce formants that are more continuous and stable. Then we perform ridge detection to find formant track candidates in the spectrogram. After removing tracks that are too short or too weak, we fit the remaining tracks with 2 nd degree polynomial curves to extract formants that are both smooth and continuous. Besides extracting thin formant tracks, we also extracted formant tracks with width. These thick formants are able to indication not only the locations of the formants but also the width of the formants. Using the voices of 70 people, we conducted experiments to test the effectiveness of the thin formants and the thick formants when they are used in speaker verification. Using only one sentence (6 to 10 words, 3 seconds in length) for comparison, the thin formants and the thick formants are able to achieve 88.3% and 93.8% of accuracy in speaker verification, respectively. When the number of sentences for comparison increased to seven, the accuracy rate improved to 93.8% and 98.7%, respectively.</description><subject>Accuracy</subject><subject>Cities and towns</subject><subject>Forensics</subject><subject>formant</subject><subject>Roads</subject><subject>Smoothing methods</subject><subject>speaker verification</subject><subject>Spectrogram</subject><subject>Speech</subject><subject>text sensitive</subject><subject>tracking</subject><issn>1071-6572</issn><issn>2153-0742</issn><isbn>1467324507</isbn><isbn>9781467324502</isbn><isbn>9781467324496</isbn><isbn>1467324515</isbn><isbn>1467324493</isbn><isbn>9781467324519</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2012</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotkMtKxDAUhuMNrGMfQNz0BVpPkpOkWUpxVBhxYffDmTQd4kwvpEXw7S3Y1Q_ff1n8jD1wKDgH-1RVX3UhgItCSysV8guWWlNy1EYKRKsvWSK4kjkYFFfsbjUUmGuWcDA818qIW5ZO0zcALJvaapGwjzqSO4X-mLVD7Kifpyz02TR6N8fhGKmbMuqbLCycxvEcHM1h6NcMnXzMfnwM7crv2U1L58mnq25YvX2pq7d89_n6Xj3v8sCNmnONJZBu0By0996BlK4F7aAF1SAahRrcAcgIBCyRZKMAyViOjS9NKeWGPf7PhqW-H2PoKP7u12PkH8HPUos</recordid><startdate>201210</startdate><enddate>201210</enddate><creator>Jia-Guu Leu</creator><creator>Liang-tsair Geeng</creator><creator>Chang En Pu</creator><creator>Jyh-Bin Shiau</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201210</creationdate><title>Tracking formants in spectrograms and its application in speaker verification</title><author>Jia-Guu Leu ; Liang-tsair Geeng ; Chang En Pu ; Jyh-Bin Shiau</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-6480a6d47b6eeec033cf06c0f05d4475460cb0a7240484a3d504a7914de87833</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Accuracy</topic><topic>Cities and towns</topic><topic>Forensics</topic><topic>formant</topic><topic>Roads</topic><topic>Smoothing methods</topic><topic>speaker verification</topic><topic>Spectrogram</topic><topic>Speech</topic><topic>text sensitive</topic><topic>tracking</topic><toplevel>online_resources</toplevel><creatorcontrib>Jia-Guu Leu</creatorcontrib><creatorcontrib>Liang-tsair Geeng</creatorcontrib><creatorcontrib>Chang En Pu</creatorcontrib><creatorcontrib>Jyh-Bin Shiau</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jia-Guu Leu</au><au>Liang-tsair Geeng</au><au>Chang En Pu</au><au>Jyh-Bin Shiau</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Tracking formants in spectrograms and its application in speaker verification</atitle><btitle>2012 IEEE International Carnahan Conference on Security Technology (ICCST)</btitle><stitle>CCST</stitle><date>2012-10</date><risdate>2012</risdate><spage>83</spage><epage>89</epage><pages>83-89</pages><issn>1071-6572</issn><eissn>2153-0742</eissn><isbn>1467324507</isbn><isbn>9781467324502</isbn><eisbn>9781467324496</eisbn><eisbn>1467324515</eisbn><eisbn>1467324493</eisbn><eisbn>9781467324519</eisbn><abstract>Formants are the most visible features in spectrograms and they also hold the most valuable speech information. Traditionally, formant tracks are found by first finding formant points in individual frames, then the formants points in neighboring frames are joined together to form tracks. In this paper we present a formant tracking approach based on image processing techniques. Our approach is to first find the running directions of the formants in a spectrogram. Then we perform smoothing on the spectrogram along the directions of the formants to produce formants that are more continuous and stable. Then we perform ridge detection to find formant track candidates in the spectrogram. After removing tracks that are too short or too weak, we fit the remaining tracks with 2 nd degree polynomial curves to extract formants that are both smooth and continuous. Besides extracting thin formant tracks, we also extracted formant tracks with width. These thick formants are able to indication not only the locations of the formants but also the width of the formants. Using the voices of 70 people, we conducted experiments to test the effectiveness of the thin formants and the thick formants when they are used in speaker verification. Using only one sentence (6 to 10 words, 3 seconds in length) for comparison, the thin formants and the thick formants are able to achieve 88.3% and 93.8% of accuracy in speaker verification, respectively. When the number of sentences for comparison increased to seven, the accuracy rate improved to 93.8% and 98.7%, respectively.</abstract><pub>IEEE</pub><doi>10.1109/CCST.2012.6393541</doi><tpages>7</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1071-6572
ispartof	2012 IEEE International Carnahan Conference on Security Technology (ICCST), 2012, p.83-89
issn	1071-6572 2153-0742
language	eng
recordid	cdi_ieee_primary_6393541
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Accuracy Cities and towns Forensics formant Roads Smoothing methods speaker verification Spectrogram Speech text sensitive tracking
title	Tracking formants in spectrograms and its application in speaker verification
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T17%3A46%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Tracking%20formants%20in%20spectrograms%20and%20its%20application%20in%20speaker%20verification&rft.btitle=2012%20IEEE%20International%20Carnahan%20Conference%20on%20Security%20Technology%20(ICCST)&rft.au=Jia-Guu%20Leu&rft.date=2012-10&rft.spage=83&rft.epage=89&rft.pages=83-89&rft.issn=1071-6572&rft.eissn=2153-0742&rft.isbn=1467324507&rft.isbn_list=9781467324502&rft_id=info:doi/10.1109/CCST.2012.6393541&rft.eisbn=9781467324496&rft.eisbn_list=1467324515&rft.eisbn_list=1467324493&rft.eisbn_list=9781467324519&rft_dat=%3Cieee_6IE%3E6393541%3C/ieee_6IE%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i175t-6480a6d47b6eeec033cf06c0f05d4475460cb0a7240484a3d504a7914de87833%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6393541&rfr_iscdi=true