Loading…

A Novel Mask Estimation Method Employing Posterior-Based Representative Mean Estimate for Missing-Feature Speech Recognition

This paper proposes a novel mask estimation method for missing-feature reconstruction to improve speech recognition performance in various types of background noise conditions. A conventional mask estimation method based on spectral subtraction degrades performance, due to incorrect estimation of th...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on audio, speech, and language processing speech, and language processing, 2011-07, Vol.19 (5), p.1434-1443
Main Authors: Wooil Kim, Hansen, John H L
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c355t-b2b84daae63248877751058632599c2d059fc5468259cf3af8cade4b6c0b2dd23
cites cdi_FETCH-LOGICAL-c355t-b2b84daae63248877751058632599c2d059fc5468259cf3af8cade4b6c0b2dd23
container_end_page 1443
container_issue 5
container_start_page 1434
container_title IEEE transactions on audio, speech, and language processing
container_volume 19
creator Wooil Kim
Hansen, John H L
description This paper proposes a novel mask estimation method for missing-feature reconstruction to improve speech recognition performance in various types of background noise conditions. A conventional mask estimation method based on spectral subtraction degrades performance, due to incorrect estimation of the noise signal which fails to accurately represent the variations of background noise during the incoming speech utterance. The proposed mask estimation method utilizes a Posterior-based Representative Mean (PRM) estimate for determining the reliability of the input speech spectral components, which is obtained as a weighted sum of the mean parameters of the speech model using the posterior probability. To obtain the noise-corrupted speech model, a model combination method is employed, which was proposed in our previous study for a feature compensation method. Experimental results demonstrate that the proposed mask estimation method provides more separable distributions for the reliable/unreliable component classifier compared to the conventional mask estimation method. The recognition performance is evaluated using the Aurora 2.0 framework over various types of background noise conditions and the CU-Move real-life in-vehicle corpus. The performance evaluation shows that the proposed mask estimation method is considerably more effective at increasing speech recognition performance in various types of background noise conditions, compared to the conventional mask estimation method which is based on spectral subtraction. By employing the proposed PRM-based mask estimation for missing-feature reconstruction, we obtain +23.41% and +9.45% average relative improvements in word error rate for all four types of noise conditions and CU-Move corpus, respectively, compared to conventional mask estimation methods.
doi_str_mv 10.1109/TASL.2010.2091633
format article
fullrecord <record><control><sourceid>proquest_pasca</sourceid><recordid>TN_cdi_pascalfrancis_primary_24286322</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5667043</ieee_id><sourcerecordid>2559708021</sourcerecordid><originalsourceid>FETCH-LOGICAL-c355t-b2b84daae63248877751058632599c2d059fc5468259cf3af8cade4b6c0b2dd23</originalsourceid><addsrcrecordid>eNpdkU1LxDAQhoso-PkDxEsQBC9dk7Rp0-Mq6wfsqrjruaTpVKPdpmbaBcEfb8que_CUDHned2byBsEpoyPGaHa1GM-nI059yWnGkijaCQ6YEDJMMx7vbu8s2Q8OET8ojaMkZgfBz5g82hXUZKbwk0ywM0vVGduQGXTvtiSTZVvbb9O8kWeLHThjXXitEEryAq0DhKbz_Ao8r5o_PZDKOjIziF4Y3oLqegdk3gLod6_T9q0xQ5PjYK9SNcLJ5jwKXm8ni5v7cPp093AznoY6EqILC17IuFQKkojHUqZpKhgV0lciyzQvqcgqLeJE-lpXkaqkViXERaJpwcuSR0fB5dq3dfarB-zypUENda0asD3mzP-blNQ7e_T8H_phe9f46fKMiSRJpWQeYmtIO4vooMpb5_d2394pH-LIhzjyIY58E4fXXGyMFWpVV0412uBWyGM-LDTMerbmDABsn4fOPrLoF6jqk9Y</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>915667881</pqid></control><display><type>article</type><title>A Novel Mask Estimation Method Employing Posterior-Based Representative Mean Estimate for Missing-Feature Speech Recognition</title><source>IEEE Xplore (Online service)</source><creator>Wooil Kim ; Hansen, John H L</creator><creatorcontrib>Wooil Kim ; Hansen, John H L</creatorcontrib><description>This paper proposes a novel mask estimation method for missing-feature reconstruction to improve speech recognition performance in various types of background noise conditions. A conventional mask estimation method based on spectral subtraction degrades performance, due to incorrect estimation of the noise signal which fails to accurately represent the variations of background noise during the incoming speech utterance. The proposed mask estimation method utilizes a Posterior-based Representative Mean (PRM) estimate for determining the reliability of the input speech spectral components, which is obtained as a weighted sum of the mean parameters of the speech model using the posterior probability. To obtain the noise-corrupted speech model, a model combination method is employed, which was proposed in our previous study for a feature compensation method. Experimental results demonstrate that the proposed mask estimation method provides more separable distributions for the reliable/unreliable component classifier compared to the conventional mask estimation method. The recognition performance is evaluated using the Aurora 2.0 framework over various types of background noise conditions and the CU-Move real-life in-vehicle corpus. The performance evaluation shows that the proposed mask estimation method is considerably more effective at increasing speech recognition performance in various types of background noise conditions, compared to the conventional mask estimation method which is based on spectral subtraction. By employing the proposed PRM-based mask estimation for missing-feature reconstruction, we obtain +23.41% and +9.45% average relative improvements in word error rate for all four types of noise conditions and CU-Move corpus, respectively, compared to conventional mask estimation methods.</description><identifier>ISSN: 1558-7916</identifier><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 1558-7924</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASL.2010.2091633</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway, NJ: IEEE</publisher><subject>Applied sciences ; Background noise ; Estimates ; Exact sciences and technology ; Information, signal and communications theory ; mask estimation ; Masks ; Mathematical models ; missing-feature ; Noise ; Pattern recognition ; posterior-based representative mean (PRM) estimate ; Reconstruction ; robust speech recognition ; Signal and communications theory ; Signal processing ; Signal representation. Spectral analysis ; Signal, noise ; Spectra ; Speech ; Speech processing ; Speech recognition ; Studies ; Telecommunications and information theory</subject><ispartof>IEEE transactions on audio, speech, and language processing, 2011-07, Vol.19 (5), p.1434-1443</ispartof><rights>2015 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Jul 2011</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c355t-b2b84daae63248877751058632599c2d059fc5468259cf3af8cade4b6c0b2dd23</citedby><cites>FETCH-LOGICAL-c355t-b2b84daae63248877751058632599c2d059fc5468259cf3af8cade4b6c0b2dd23</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5667043$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=24286322$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Wooil Kim</creatorcontrib><creatorcontrib>Hansen, John H L</creatorcontrib><title>A Novel Mask Estimation Method Employing Posterior-Based Representative Mean Estimate for Missing-Feature Speech Recognition</title><title>IEEE transactions on audio, speech, and language processing</title><addtitle>TASL</addtitle><description>This paper proposes a novel mask estimation method for missing-feature reconstruction to improve speech recognition performance in various types of background noise conditions. A conventional mask estimation method based on spectral subtraction degrades performance, due to incorrect estimation of the noise signal which fails to accurately represent the variations of background noise during the incoming speech utterance. The proposed mask estimation method utilizes a Posterior-based Representative Mean (PRM) estimate for determining the reliability of the input speech spectral components, which is obtained as a weighted sum of the mean parameters of the speech model using the posterior probability. To obtain the noise-corrupted speech model, a model combination method is employed, which was proposed in our previous study for a feature compensation method. Experimental results demonstrate that the proposed mask estimation method provides more separable distributions for the reliable/unreliable component classifier compared to the conventional mask estimation method. The recognition performance is evaluated using the Aurora 2.0 framework over various types of background noise conditions and the CU-Move real-life in-vehicle corpus. The performance evaluation shows that the proposed mask estimation method is considerably more effective at increasing speech recognition performance in various types of background noise conditions, compared to the conventional mask estimation method which is based on spectral subtraction. By employing the proposed PRM-based mask estimation for missing-feature reconstruction, we obtain +23.41% and +9.45% average relative improvements in word error rate for all four types of noise conditions and CU-Move corpus, respectively, compared to conventional mask estimation methods.</description><subject>Applied sciences</subject><subject>Background noise</subject><subject>Estimates</subject><subject>Exact sciences and technology</subject><subject>Information, signal and communications theory</subject><subject>mask estimation</subject><subject>Masks</subject><subject>Mathematical models</subject><subject>missing-feature</subject><subject>Noise</subject><subject>Pattern recognition</subject><subject>posterior-based representative mean (PRM) estimate</subject><subject>Reconstruction</subject><subject>robust speech recognition</subject><subject>Signal and communications theory</subject><subject>Signal processing</subject><subject>Signal representation. Spectral analysis</subject><subject>Signal, noise</subject><subject>Spectra</subject><subject>Speech</subject><subject>Speech processing</subject><subject>Speech recognition</subject><subject>Studies</subject><subject>Telecommunications and information theory</subject><issn>1558-7916</issn><issn>2329-9290</issn><issn>1558-7924</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><recordid>eNpdkU1LxDAQhoso-PkDxEsQBC9dk7Rp0-Mq6wfsqrjruaTpVKPdpmbaBcEfb8que_CUDHned2byBsEpoyPGaHa1GM-nI059yWnGkijaCQ6YEDJMMx7vbu8s2Q8OET8ojaMkZgfBz5g82hXUZKbwk0ywM0vVGduQGXTvtiSTZVvbb9O8kWeLHThjXXitEEryAq0DhKbz_Ao8r5o_PZDKOjIziF4Y3oLqegdk3gLod6_T9q0xQ5PjYK9SNcLJ5jwKXm8ni5v7cPp093AznoY6EqILC17IuFQKkojHUqZpKhgV0lciyzQvqcgqLeJE-lpXkaqkViXERaJpwcuSR0fB5dq3dfarB-zypUENda0asD3mzP-blNQ7e_T8H_phe9f46fKMiSRJpWQeYmtIO4vooMpb5_d2394pH-LIhzjyIY58E4fXXGyMFWpVV0412uBWyGM-LDTMerbmDABsn4fOPrLoF6jqk9Y</recordid><startdate>20110701</startdate><enddate>20110701</enddate><creator>Wooil Kim</creator><creator>Hansen, John H L</creator><general>IEEE</general><general>Institute of Electrical and Electronics Engineers</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20110701</creationdate><title>A Novel Mask Estimation Method Employing Posterior-Based Representative Mean Estimate for Missing-Feature Speech Recognition</title><author>Wooil Kim ; Hansen, John H L</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c355t-b2b84daae63248877751058632599c2d059fc5468259cf3af8cade4b6c0b2dd23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Applied sciences</topic><topic>Background noise</topic><topic>Estimates</topic><topic>Exact sciences and technology</topic><topic>Information, signal and communications theory</topic><topic>mask estimation</topic><topic>Masks</topic><topic>Mathematical models</topic><topic>missing-feature</topic><topic>Noise</topic><topic>Pattern recognition</topic><topic>posterior-based representative mean (PRM) estimate</topic><topic>Reconstruction</topic><topic>robust speech recognition</topic><topic>Signal and communications theory</topic><topic>Signal processing</topic><topic>Signal representation. Spectral analysis</topic><topic>Signal, noise</topic><topic>Spectra</topic><topic>Speech</topic><topic>Speech processing</topic><topic>Speech recognition</topic><topic>Studies</topic><topic>Telecommunications and information theory</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wooil Kim</creatorcontrib><creatorcontrib>Hansen, John H L</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wooil Kim</au><au>Hansen, John H L</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Novel Mask Estimation Method Employing Posterior-Based Representative Mean Estimate for Missing-Feature Speech Recognition</atitle><jtitle>IEEE transactions on audio, speech, and language processing</jtitle><stitle>TASL</stitle><date>2011-07-01</date><risdate>2011</risdate><volume>19</volume><issue>5</issue><spage>1434</spage><epage>1443</epage><pages>1434-1443</pages><issn>1558-7916</issn><issn>2329-9290</issn><eissn>1558-7924</eissn><eissn>2329-9304</eissn><coden>ITASD8</coden><abstract>This paper proposes a novel mask estimation method for missing-feature reconstruction to improve speech recognition performance in various types of background noise conditions. A conventional mask estimation method based on spectral subtraction degrades performance, due to incorrect estimation of the noise signal which fails to accurately represent the variations of background noise during the incoming speech utterance. The proposed mask estimation method utilizes a Posterior-based Representative Mean (PRM) estimate for determining the reliability of the input speech spectral components, which is obtained as a weighted sum of the mean parameters of the speech model using the posterior probability. To obtain the noise-corrupted speech model, a model combination method is employed, which was proposed in our previous study for a feature compensation method. Experimental results demonstrate that the proposed mask estimation method provides more separable distributions for the reliable/unreliable component classifier compared to the conventional mask estimation method. The recognition performance is evaluated using the Aurora 2.0 framework over various types of background noise conditions and the CU-Move real-life in-vehicle corpus. The performance evaluation shows that the proposed mask estimation method is considerably more effective at increasing speech recognition performance in various types of background noise conditions, compared to the conventional mask estimation method which is based on spectral subtraction. By employing the proposed PRM-based mask estimation for missing-feature reconstruction, we obtain +23.41% and +9.45% average relative improvements in word error rate for all four types of noise conditions and CU-Move corpus, respectively, compared to conventional mask estimation methods.</abstract><cop>Piscataway, NJ</cop><pub>IEEE</pub><doi>10.1109/TASL.2010.2091633</doi><tpages>10</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1558-7916
ispartof IEEE transactions on audio, speech, and language processing, 2011-07, Vol.19 (5), p.1434-1443
issn 1558-7916
2329-9290
1558-7924
2329-9304
language eng
recordid cdi_pascalfrancis_primary_24286322
source IEEE Xplore (Online service)
subjects Applied sciences
Background noise
Estimates
Exact sciences and technology
Information, signal and communications theory
mask estimation
Masks
Mathematical models
missing-feature
Noise
Pattern recognition
posterior-based representative mean (PRM) estimate
Reconstruction
robust speech recognition
Signal and communications theory
Signal processing
Signal representation. Spectral analysis
Signal, noise
Spectra
Speech
Speech processing
Speech recognition
Studies
Telecommunications and information theory
title A Novel Mask Estimation Method Employing Posterior-Based Representative Mean Estimate for Missing-Feature Speech Recognition
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T16%3A01%3A13IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pasca&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Novel%20Mask%20Estimation%20Method%20Employing%20Posterior-Based%20Representative%20Mean%20Estimate%20for%20Missing-Feature%20Speech%20Recognition&rft.jtitle=IEEE%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Wooil%20Kim&rft.date=2011-07-01&rft.volume=19&rft.issue=5&rft.spage=1434&rft.epage=1443&rft.pages=1434-1443&rft.issn=1558-7916&rft.eissn=1558-7924&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASL.2010.2091633&rft_dat=%3Cproquest_pasca%3E2559708021%3C/proquest_pasca%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c355t-b2b84daae63248877751058632599c2d059fc5468259cf3af8cade4b6c0b2dd23%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=915667881&rft_id=info:pmid/&rft_ieee_id=5667043&rfr_iscdi=true