Loading…

Compensation of Nuisance Factors for Speaker and Language Recognition

The variability of the channel and environment is one of the most important factors affecting the performance of text-independent speaker verification systems. The best techniques for channel compensation are model based. Most of them have been proposed for Gaussian mixture models, while in the feat...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on audio, speech, and language processing speech, and language processing, 2007-09, Vol.15 (7), p.1969-1978
Main Authors: Castaldo, F.., Colibro, D.., Dalmasso, E.., Laface, P.., Vair, C..
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c395t-60f0840365c44f47a06c1ce0dc9ce99a2b692f3c2084ff8e39563acc1d0f8b623
cites cdi_FETCH-LOGICAL-c395t-60f0840365c44f47a06c1ce0dc9ce99a2b692f3c2084ff8e39563acc1d0f8b623
container_end_page 1978
container_issue 7
container_start_page 1969
container_title IEEE transactions on audio, speech, and language processing
container_volume 15
creator Castaldo, F..
Colibro, D..
Dalmasso, E..
Laface, P..
Vair, C..
description The variability of the channel and environment is one of the most important factors affecting the performance of text-independent speaker verification systems. The best techniques for channel compensation are model based. Most of them have been proposed for Gaussian mixture models, while in the feature domain blind channel compensation is usually performed. The aim of this work is to explore techniques that allow more accurate intersession compensation in the feature domain. Compensating the features rather than the models has the advantage that the transformed parameters can be used with models of a different nature and complexity and for different tasks. In this paper, we evaluate the effects of the compensation of the intersession variability obtained by means of the channel factors approach. In particular, we compare channel variability modeling in the usual Gaussian mixture model domain, and our proposed feature domain compensation technique. We show that the two approaches lead to similar results on the NIST 2005 Speaker Recognition Evaluation data with a reduced computation cost. We also report the results of a system, based on the intersession compensation technique in the feature space that was among the best participants in the NIST 2006 Speaker Recognition Evaluation. Moreover, we show how we obtained significant performance improvement in language recognition by estimating and compensating, in the feature domain, the distortions due to interspeaker variability within the same language.
doi_str_mv 10.1109/TASL.2007.901823
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_875061856</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4291593</ieee_id><sourcerecordid>2568802681</sourcerecordid><originalsourceid>FETCH-LOGICAL-c395t-60f0840365c44f47a06c1ce0dc9ce99a2b692f3c2084ff8e39563acc1d0f8b623</originalsourceid><addsrcrecordid>eNp9kLFOwzAQQC0EEqWwI7FYDDC1nGPHiUdUtYAUgUTLbLnuuUpp42AnA39PoqAODEx3w3sn3SPkmsGUMVAPq8dlMU0AsqkClif8hIxYmuaTTCXi9LgzeU4uYtwBCC4FG5H5zB9qrKJpSl9R7-hrW0ZTWaQLYxsfInU-0GWN5hMDNdWGFqbatmaL9B2t31ZlL16SM2f2Ea9-55h8LOar2fOkeHt6mT0WE8tV2kwkOMgFcJlaIZzIDEjLLMLGKotKmWQtVeK4TTrKuRw7SXJjLduAy9cy4WNyP9ytg_9qMTb6UEaL-72p0LdR51kKkuWdNSZ3_5JccpZL4B14-wfc-TZU3RdasUwwIQE6CAbIBh9jQKfrUB5M-NYMdJ9f9_l1n18P-TvlZlBKRDziIlEsVZz_AL_Yf44</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>917414600</pqid></control><display><type>article</type><title>Compensation of Nuisance Factors for Speaker and Language Recognition</title><source>IEEE Xplore (Online service)</source><creator>Castaldo, F.. ; Colibro, D.. ; Dalmasso, E.. ; Laface, P.. ; Vair, C..</creator><creatorcontrib>Castaldo, F.. ; Colibro, D.. ; Dalmasso, E.. ; Laface, P.. ; Vair, C..</creatorcontrib><description>The variability of the channel and environment is one of the most important factors affecting the performance of text-independent speaker verification systems. The best techniques for channel compensation are model based. Most of them have been proposed for Gaussian mixture models, while in the feature domain blind channel compensation is usually performed. The aim of this work is to explore techniques that allow more accurate intersession compensation in the feature domain. Compensating the features rather than the models has the advantage that the transformed parameters can be used with models of a different nature and complexity and for different tasks. In this paper, we evaluate the effects of the compensation of the intersession variability obtained by means of the channel factors approach. In particular, we compare channel variability modeling in the usual Gaussian mixture model domain, and our proposed feature domain compensation technique. We show that the two approaches lead to similar results on the NIST 2005 Speaker Recognition Evaluation data with a reduced computation cost. We also report the results of a system, based on the intersession compensation technique in the feature space that was among the best participants in the NIST 2006 Speaker Recognition Evaluation. Moreover, we show how we obtained significant performance improvement in language recognition by estimating and compensating, in the feature domain, the distortions due to interspeaker variability within the same language.</description><identifier>ISSN: 1558-7916</identifier><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 1558-7924</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASL.2007.901823</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Acoustic distortion ; Acoustic testing ; Automatic speech recognition ; Channels ; Compensation ; Computational efficiency ; Estimating ; Factor analysis ; feature compensation ; Gaussian ; Gaussian distribution ; language recognition ; Loudspeakers ; Mathematical models ; Natural languages ; NIST ; Nuisance ; Recognition ; Speaker recognition ; Speech recognition ; Studies ; System testing</subject><ispartof>IEEE transactions on audio, speech, and language processing, 2007-09, Vol.15 (7), p.1969-1978</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2007</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c395t-60f0840365c44f47a06c1ce0dc9ce99a2b692f3c2084ff8e39563acc1d0f8b623</citedby><cites>FETCH-LOGICAL-c395t-60f0840365c44f47a06c1ce0dc9ce99a2b692f3c2084ff8e39563acc1d0f8b623</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4291593$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,27903,27904,54774</link.rule.ids></links><search><creatorcontrib>Castaldo, F..</creatorcontrib><creatorcontrib>Colibro, D..</creatorcontrib><creatorcontrib>Dalmasso, E..</creatorcontrib><creatorcontrib>Laface, P..</creatorcontrib><creatorcontrib>Vair, C..</creatorcontrib><title>Compensation of Nuisance Factors for Speaker and Language Recognition</title><title>IEEE transactions on audio, speech, and language processing</title><addtitle>TASL</addtitle><description>The variability of the channel and environment is one of the most important factors affecting the performance of text-independent speaker verification systems. The best techniques for channel compensation are model based. Most of them have been proposed for Gaussian mixture models, while in the feature domain blind channel compensation is usually performed. The aim of this work is to explore techniques that allow more accurate intersession compensation in the feature domain. Compensating the features rather than the models has the advantage that the transformed parameters can be used with models of a different nature and complexity and for different tasks. In this paper, we evaluate the effects of the compensation of the intersession variability obtained by means of the channel factors approach. In particular, we compare channel variability modeling in the usual Gaussian mixture model domain, and our proposed feature domain compensation technique. We show that the two approaches lead to similar results on the NIST 2005 Speaker Recognition Evaluation data with a reduced computation cost. We also report the results of a system, based on the intersession compensation technique in the feature space that was among the best participants in the NIST 2006 Speaker Recognition Evaluation. Moreover, we show how we obtained significant performance improvement in language recognition by estimating and compensating, in the feature domain, the distortions due to interspeaker variability within the same language.</description><subject>Acoustic distortion</subject><subject>Acoustic testing</subject><subject>Automatic speech recognition</subject><subject>Channels</subject><subject>Compensation</subject><subject>Computational efficiency</subject><subject>Estimating</subject><subject>Factor analysis</subject><subject>feature compensation</subject><subject>Gaussian</subject><subject>Gaussian distribution</subject><subject>language recognition</subject><subject>Loudspeakers</subject><subject>Mathematical models</subject><subject>Natural languages</subject><subject>NIST</subject><subject>Nuisance</subject><subject>Recognition</subject><subject>Speaker recognition</subject><subject>Speech recognition</subject><subject>Studies</subject><subject>System testing</subject><issn>1558-7916</issn><issn>2329-9290</issn><issn>1558-7924</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2007</creationdate><recordtype>article</recordtype><recordid>eNp9kLFOwzAQQC0EEqWwI7FYDDC1nGPHiUdUtYAUgUTLbLnuuUpp42AnA39PoqAODEx3w3sn3SPkmsGUMVAPq8dlMU0AsqkClif8hIxYmuaTTCXi9LgzeU4uYtwBCC4FG5H5zB9qrKJpSl9R7-hrW0ZTWaQLYxsfInU-0GWN5hMDNdWGFqbatmaL9B2t31ZlL16SM2f2Ea9-55h8LOar2fOkeHt6mT0WE8tV2kwkOMgFcJlaIZzIDEjLLMLGKotKmWQtVeK4TTrKuRw7SXJjLduAy9cy4WNyP9ytg_9qMTb6UEaL-72p0LdR51kKkuWdNSZ3_5JccpZL4B14-wfc-TZU3RdasUwwIQE6CAbIBh9jQKfrUB5M-NYMdJ9f9_l1n18P-TvlZlBKRDziIlEsVZz_AL_Yf44</recordid><startdate>20070901</startdate><enddate>20070901</enddate><creator>Castaldo, F..</creator><creator>Colibro, D..</creator><creator>Dalmasso, E..</creator><creator>Laface, P..</creator><creator>Vair, C..</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20070901</creationdate><title>Compensation of Nuisance Factors for Speaker and Language Recognition</title><author>Castaldo, F.. ; Colibro, D.. ; Dalmasso, E.. ; Laface, P.. ; Vair, C..</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c395t-60f0840365c44f47a06c1ce0dc9ce99a2b692f3c2084ff8e39563acc1d0f8b623</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2007</creationdate><topic>Acoustic distortion</topic><topic>Acoustic testing</topic><topic>Automatic speech recognition</topic><topic>Channels</topic><topic>Compensation</topic><topic>Computational efficiency</topic><topic>Estimating</topic><topic>Factor analysis</topic><topic>feature compensation</topic><topic>Gaussian</topic><topic>Gaussian distribution</topic><topic>language recognition</topic><topic>Loudspeakers</topic><topic>Mathematical models</topic><topic>Natural languages</topic><topic>NIST</topic><topic>Nuisance</topic><topic>Recognition</topic><topic>Speaker recognition</topic><topic>Speech recognition</topic><topic>Studies</topic><topic>System testing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Castaldo, F..</creatorcontrib><creatorcontrib>Colibro, D..</creatorcontrib><creatorcontrib>Dalmasso, E..</creatorcontrib><creatorcontrib>Laface, P..</creatorcontrib><creatorcontrib>Vair, C..</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Castaldo, F..</au><au>Colibro, D..</au><au>Dalmasso, E..</au><au>Laface, P..</au><au>Vair, C..</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Compensation of Nuisance Factors for Speaker and Language Recognition</atitle><jtitle>IEEE transactions on audio, speech, and language processing</jtitle><stitle>TASL</stitle><date>2007-09-01</date><risdate>2007</risdate><volume>15</volume><issue>7</issue><spage>1969</spage><epage>1978</epage><pages>1969-1978</pages><issn>1558-7916</issn><issn>2329-9290</issn><eissn>1558-7924</eissn><eissn>2329-9304</eissn><coden>ITASD8</coden><abstract>The variability of the channel and environment is one of the most important factors affecting the performance of text-independent speaker verification systems. The best techniques for channel compensation are model based. Most of them have been proposed for Gaussian mixture models, while in the feature domain blind channel compensation is usually performed. The aim of this work is to explore techniques that allow more accurate intersession compensation in the feature domain. Compensating the features rather than the models has the advantage that the transformed parameters can be used with models of a different nature and complexity and for different tasks. In this paper, we evaluate the effects of the compensation of the intersession variability obtained by means of the channel factors approach. In particular, we compare channel variability modeling in the usual Gaussian mixture model domain, and our proposed feature domain compensation technique. We show that the two approaches lead to similar results on the NIST 2005 Speaker Recognition Evaluation data with a reduced computation cost. We also report the results of a system, based on the intersession compensation technique in the feature space that was among the best participants in the NIST 2006 Speaker Recognition Evaluation. Moreover, we show how we obtained significant performance improvement in language recognition by estimating and compensating, in the feature domain, the distortions due to interspeaker variability within the same language.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TASL.2007.901823</doi><tpages>10</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1558-7916
ispartof IEEE transactions on audio, speech, and language processing, 2007-09, Vol.15 (7), p.1969-1978
issn 1558-7916
2329-9290
1558-7924
2329-9304
language eng
recordid cdi_proquest_miscellaneous_875061856
source IEEE Xplore (Online service)
subjects Acoustic distortion
Acoustic testing
Automatic speech recognition
Channels
Compensation
Computational efficiency
Estimating
Factor analysis
feature compensation
Gaussian
Gaussian distribution
language recognition
Loudspeakers
Mathematical models
Natural languages
NIST
Nuisance
Recognition
Speaker recognition
Speech recognition
Studies
System testing
title Compensation of Nuisance Factors for Speaker and Language Recognition
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T19%3A25%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Compensation%20of%20Nuisance%20Factors%20for%20Speaker%20and%20Language%20Recognition&rft.jtitle=IEEE%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Castaldo,%20F..&rft.date=2007-09-01&rft.volume=15&rft.issue=7&rft.spage=1969&rft.epage=1978&rft.pages=1969-1978&rft.issn=1558-7916&rft.eissn=1558-7924&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASL.2007.901823&rft_dat=%3Cproquest_cross%3E2568802681%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c395t-60f0840365c44f47a06c1ce0dc9ce99a2b692f3c2084ff8e39563acc1d0f8b623%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=917414600&rft_id=info:pmid/&rft_ieee_id=4291593&rfr_iscdi=true