Loading…

Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor

We present an efficient and effective nonlinear feature-domain noise suppression algorithm, motivated by the minimum-mean-square-error (MMSE) optimization criterion, for noise-robust speech recognition. Distinguishing from the log-MMSE spectral amplitude noise suppressor proposed by Ephraim and Mala...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on audio, speech, and language processing speech, and language processing, 2008-07, Vol.16 (5), p.1061-1070
Main Authors: Dong Yu, Li Deng, Droppo, J., Jian Wu, Yifan Gong, Acero, A.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c383t-bcfca31393b240174320f3a841c04cd01cc3a2c1e03a1e692a5e0bfc2f5dcbd83
cites cdi_FETCH-LOGICAL-c383t-bcfca31393b240174320f3a841c04cd01cc3a2c1e03a1e692a5e0bfc2f5dcbd83
container_end_page 1070
container_issue 5
container_start_page 1061
container_title IEEE transactions on audio, speech, and language processing
container_volume 16
creator Dong Yu
Li Deng
Droppo, J.
Jian Wu
Yifan Gong
Acero, A.
description We present an efficient and effective nonlinear feature-domain noise suppression algorithm, motivated by the minimum-mean-square-error (MMSE) optimization criterion, for noise-robust speech recognition. Distinguishing from the log-MMSE spectral amplitude noise suppressor proposed by Ephraim and Malah (E&M), our new algorithm is aimed to minimize the error expressed explicitly for the Mel-frequency cepstra instead of discrete Fourier transform (DFT) spectra, and it operates on the Mel-frequency filter bank's output. As a consequence, the statistics used to estimate the suppression factor become vastly different from those used in the E&M log-MMSE suppressor. Our algorithm is significantly more efficient than the E&M's log-MMSE suppressor since the number of the channels in the Mel-frequency filter bank is much smaller (23 in our case) than the number of bins (256) in DFT. We have conducted extensive speech recognition experiments on the standard Aurora-3 task. The experimental results demonstrate a reduction of the recognition word error rate by 48% over the standard ICSLP02 baseline, 26% over the cepstral mean normalization baseline, and 13% over the popular E&M's log-MMSE noise suppressor. The experiments also show that our new algorithm performs slightly better than the ETSI advanced front end (AFE) on the well-matched and mid-mismatched settings, and has 8% and 10% fewer errors than our earlier SPLICE (stereo-based piecewise linear compensation for environments) system on these settings, respectively.
doi_str_mv 10.1109/TASL.2008.921761
format article
fullrecord <record><control><sourceid>proquest_pasca</sourceid><recordid>TN_cdi_pascalfrancis_primary_20499564</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4497834</ieee_id><sourcerecordid>2568776931</sourcerecordid><originalsourceid>FETCH-LOGICAL-c383t-bcfca31393b240174320f3a841c04cd01cc3a2c1e03a1e692a5e0bfc2f5dcbd83</originalsourceid><addsrcrecordid>eNp9kctLw0AQh4Mo-LwLXoKgnlJnn8kepdQHtAptPS-b7URX0mzcTQT_e1MqPXjwNAPzzY8ZviQ5JzAiBNTt8m4xHVGAYqQoySXZS46IEEWWK8r3dz2Rh8lxjB8AnElOjhI992Ufu3TRItr3dI7WvzWuc75JX6Nr3lKTjrGNXTB1OnONW_frbIamyRafvQmYTULwIZv5zn2ZDlfps3cR00XftgFj9OE0OahMHfHst54kr_eT5fgxm748PI3vppllBeuy0lbWMMIUKykHknNGoWKm4MQCtysg1jJDLUFghqBU1AiEsrK0Eitbrgp2ktxsc9vgP3uMnV67aLGuTYO-j7rIBXAppBjI639JJplgBOQAXv4BP3wfmuELrYYLQQHfpMEWssHHGLDSbXBrE741Ab0Rozdi9EaM3ooZVq5-c020pq6CaayLuz0KXCkh-cBdbDmHiLsx5yovGGc__nSWug</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>917409045</pqid></control><display><type>article</type><title>Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Dong Yu ; Li Deng ; Droppo, J. ; Jian Wu ; Yifan Gong ; Acero, A.</creator><creatorcontrib>Dong Yu ; Li Deng ; Droppo, J. ; Jian Wu ; Yifan Gong ; Acero, A.</creatorcontrib><description>We present an efficient and effective nonlinear feature-domain noise suppression algorithm, motivated by the minimum-mean-square-error (MMSE) optimization criterion, for noise-robust speech recognition. Distinguishing from the log-MMSE spectral amplitude noise suppressor proposed by Ephraim and Malah (E&amp;M), our new algorithm is aimed to minimize the error expressed explicitly for the Mel-frequency cepstra instead of discrete Fourier transform (DFT) spectra, and it operates on the Mel-frequency filter bank's output. As a consequence, the statistics used to estimate the suppression factor become vastly different from those used in the E&amp;M log-MMSE suppressor. Our algorithm is significantly more efficient than the E&amp;M's log-MMSE suppressor since the number of the channels in the Mel-frequency filter bank is much smaller (23 in our case) than the number of bins (256) in DFT. We have conducted extensive speech recognition experiments on the standard Aurora-3 task. The experimental results demonstrate a reduction of the recognition word error rate by 48% over the standard ICSLP02 baseline, 26% over the cepstral mean normalization baseline, and 13% over the popular E&amp;M's log-MMSE noise suppressor. The experiments also show that our new algorithm performs slightly better than the ETSI advanced front end (AFE) on the well-matched and mid-mismatched settings, and has 8% and 10% fewer errors than our earlier SPLICE (stereo-based piecewise linear compensation for environments) system on these settings, respectively.</description><identifier>ISSN: 1558-7916</identifier><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 1558-7924</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASL.2008.921761</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway, NJ: IEEE</publisher><subject>Algorithms ; Applied sciences ; Cepstral analysis ; Channels ; Detection, estimation, filtering, equalization, prediction ; Discrete Fourier transforms ; Error analysis ; Errors ; Exact sciences and technology ; Filter bank ; Fourier transforms ; Information, signal and communications theory ; Mel frequency cepstral coefficient ; Mel-frequency cepstral coefficient (MFCC) ; minimum-mean-square-error (MMSE) estimate ; Miscellaneous ; Noise ; Noise level ; Noise reduction ; Noise robustness ; phase asynchrony ; robust automatic speech recognition (ASR) ; Signal and communications theory ; Signal processing ; Signal, noise ; Spectra ; Speech processing ; Speech recognition ; Statistics ; Studies ; Suppressors ; Telecommunications and information theory</subject><ispartof>IEEE transactions on audio, speech, and language processing, 2008-07, Vol.16 (5), p.1061-1070</ispartof><rights>2008 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2008</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c383t-bcfca31393b240174320f3a841c04cd01cc3a2c1e03a1e692a5e0bfc2f5dcbd83</citedby><cites>FETCH-LOGICAL-c383t-bcfca31393b240174320f3a841c04cd01cc3a2c1e03a1e692a5e0bfc2f5dcbd83</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4497834$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=20499564$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Dong Yu</creatorcontrib><creatorcontrib>Li Deng</creatorcontrib><creatorcontrib>Droppo, J.</creatorcontrib><creatorcontrib>Jian Wu</creatorcontrib><creatorcontrib>Yifan Gong</creatorcontrib><creatorcontrib>Acero, A.</creatorcontrib><title>Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor</title><title>IEEE transactions on audio, speech, and language processing</title><addtitle>TASL</addtitle><description>We present an efficient and effective nonlinear feature-domain noise suppression algorithm, motivated by the minimum-mean-square-error (MMSE) optimization criterion, for noise-robust speech recognition. Distinguishing from the log-MMSE spectral amplitude noise suppressor proposed by Ephraim and Malah (E&amp;M), our new algorithm is aimed to minimize the error expressed explicitly for the Mel-frequency cepstra instead of discrete Fourier transform (DFT) spectra, and it operates on the Mel-frequency filter bank's output. As a consequence, the statistics used to estimate the suppression factor become vastly different from those used in the E&amp;M log-MMSE suppressor. Our algorithm is significantly more efficient than the E&amp;M's log-MMSE suppressor since the number of the channels in the Mel-frequency filter bank is much smaller (23 in our case) than the number of bins (256) in DFT. We have conducted extensive speech recognition experiments on the standard Aurora-3 task. The experimental results demonstrate a reduction of the recognition word error rate by 48% over the standard ICSLP02 baseline, 26% over the cepstral mean normalization baseline, and 13% over the popular E&amp;M's log-MMSE noise suppressor. The experiments also show that our new algorithm performs slightly better than the ETSI advanced front end (AFE) on the well-matched and mid-mismatched settings, and has 8% and 10% fewer errors than our earlier SPLICE (stereo-based piecewise linear compensation for environments) system on these settings, respectively.</description><subject>Algorithms</subject><subject>Applied sciences</subject><subject>Cepstral analysis</subject><subject>Channels</subject><subject>Detection, estimation, filtering, equalization, prediction</subject><subject>Discrete Fourier transforms</subject><subject>Error analysis</subject><subject>Errors</subject><subject>Exact sciences and technology</subject><subject>Filter bank</subject><subject>Fourier transforms</subject><subject>Information, signal and communications theory</subject><subject>Mel frequency cepstral coefficient</subject><subject>Mel-frequency cepstral coefficient (MFCC)</subject><subject>minimum-mean-square-error (MMSE) estimate</subject><subject>Miscellaneous</subject><subject>Noise</subject><subject>Noise level</subject><subject>Noise reduction</subject><subject>Noise robustness</subject><subject>phase asynchrony</subject><subject>robust automatic speech recognition (ASR)</subject><subject>Signal and communications theory</subject><subject>Signal processing</subject><subject>Signal, noise</subject><subject>Spectra</subject><subject>Speech processing</subject><subject>Speech recognition</subject><subject>Statistics</subject><subject>Studies</subject><subject>Suppressors</subject><subject>Telecommunications and information theory</subject><issn>1558-7916</issn><issn>2329-9290</issn><issn>1558-7924</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2008</creationdate><recordtype>article</recordtype><recordid>eNp9kctLw0AQh4Mo-LwLXoKgnlJnn8kepdQHtAptPS-b7URX0mzcTQT_e1MqPXjwNAPzzY8ZviQ5JzAiBNTt8m4xHVGAYqQoySXZS46IEEWWK8r3dz2Rh8lxjB8AnElOjhI992Ufu3TRItr3dI7WvzWuc75JX6Nr3lKTjrGNXTB1OnONW_frbIamyRafvQmYTULwIZv5zn2ZDlfps3cR00XftgFj9OE0OahMHfHst54kr_eT5fgxm748PI3vppllBeuy0lbWMMIUKykHknNGoWKm4MQCtysg1jJDLUFghqBU1AiEsrK0Eitbrgp2ktxsc9vgP3uMnV67aLGuTYO-j7rIBXAppBjI639JJplgBOQAXv4BP3wfmuELrYYLQQHfpMEWssHHGLDSbXBrE741Ab0Rozdi9EaM3ooZVq5-c020pq6CaayLuz0KXCkh-cBdbDmHiLsx5yovGGc__nSWug</recordid><startdate>20080701</startdate><enddate>20080701</enddate><creator>Dong Yu</creator><creator>Li Deng</creator><creator>Droppo, J.</creator><creator>Jian Wu</creator><creator>Yifan Gong</creator><creator>Acero, A.</creator><general>IEEE</general><general>Institute of Electrical and Electronics Engineers</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20080701</creationdate><title>Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor</title><author>Dong Yu ; Li Deng ; Droppo, J. ; Jian Wu ; Yifan Gong ; Acero, A.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c383t-bcfca31393b240174320f3a841c04cd01cc3a2c1e03a1e692a5e0bfc2f5dcbd83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Algorithms</topic><topic>Applied sciences</topic><topic>Cepstral analysis</topic><topic>Channels</topic><topic>Detection, estimation, filtering, equalization, prediction</topic><topic>Discrete Fourier transforms</topic><topic>Error analysis</topic><topic>Errors</topic><topic>Exact sciences and technology</topic><topic>Filter bank</topic><topic>Fourier transforms</topic><topic>Information, signal and communications theory</topic><topic>Mel frequency cepstral coefficient</topic><topic>Mel-frequency cepstral coefficient (MFCC)</topic><topic>minimum-mean-square-error (MMSE) estimate</topic><topic>Miscellaneous</topic><topic>Noise</topic><topic>Noise level</topic><topic>Noise reduction</topic><topic>Noise robustness</topic><topic>phase asynchrony</topic><topic>robust automatic speech recognition (ASR)</topic><topic>Signal and communications theory</topic><topic>Signal processing</topic><topic>Signal, noise</topic><topic>Spectra</topic><topic>Speech processing</topic><topic>Speech recognition</topic><topic>Statistics</topic><topic>Studies</topic><topic>Suppressors</topic><topic>Telecommunications and information theory</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dong Yu</creatorcontrib><creatorcontrib>Li Deng</creatorcontrib><creatorcontrib>Droppo, J.</creatorcontrib><creatorcontrib>Jian Wu</creatorcontrib><creatorcontrib>Yifan Gong</creatorcontrib><creatorcontrib>Acero, A.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dong Yu</au><au>Li Deng</au><au>Droppo, J.</au><au>Jian Wu</au><au>Yifan Gong</au><au>Acero, A.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor</atitle><jtitle>IEEE transactions on audio, speech, and language processing</jtitle><stitle>TASL</stitle><date>2008-07-01</date><risdate>2008</risdate><volume>16</volume><issue>5</issue><spage>1061</spage><epage>1070</epage><pages>1061-1070</pages><issn>1558-7916</issn><issn>2329-9290</issn><eissn>1558-7924</eissn><eissn>2329-9304</eissn><coden>ITASD8</coden><abstract>We present an efficient and effective nonlinear feature-domain noise suppression algorithm, motivated by the minimum-mean-square-error (MMSE) optimization criterion, for noise-robust speech recognition. Distinguishing from the log-MMSE spectral amplitude noise suppressor proposed by Ephraim and Malah (E&amp;M), our new algorithm is aimed to minimize the error expressed explicitly for the Mel-frequency cepstra instead of discrete Fourier transform (DFT) spectra, and it operates on the Mel-frequency filter bank's output. As a consequence, the statistics used to estimate the suppression factor become vastly different from those used in the E&amp;M log-MMSE suppressor. Our algorithm is significantly more efficient than the E&amp;M's log-MMSE suppressor since the number of the channels in the Mel-frequency filter bank is much smaller (23 in our case) than the number of bins (256) in DFT. We have conducted extensive speech recognition experiments on the standard Aurora-3 task. The experimental results demonstrate a reduction of the recognition word error rate by 48% over the standard ICSLP02 baseline, 26% over the cepstral mean normalization baseline, and 13% over the popular E&amp;M's log-MMSE noise suppressor. The experiments also show that our new algorithm performs slightly better than the ETSI advanced front end (AFE) on the well-matched and mid-mismatched settings, and has 8% and 10% fewer errors than our earlier SPLICE (stereo-based piecewise linear compensation for environments) system on these settings, respectively.</abstract><cop>Piscataway, NJ</cop><pub>IEEE</pub><doi>10.1109/TASL.2008.921761</doi><tpages>10</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1558-7916
ispartof IEEE transactions on audio, speech, and language processing, 2008-07, Vol.16 (5), p.1061-1070
issn 1558-7916
2329-9290
1558-7924
2329-9304
language eng
recordid cdi_pascalfrancis_primary_20499564
source IEEE Electronic Library (IEL) Journals
subjects Algorithms
Applied sciences
Cepstral analysis
Channels
Detection, estimation, filtering, equalization, prediction
Discrete Fourier transforms
Error analysis
Errors
Exact sciences and technology
Filter bank
Fourier transforms
Information, signal and communications theory
Mel frequency cepstral coefficient
Mel-frequency cepstral coefficient (MFCC)
minimum-mean-square-error (MMSE) estimate
Miscellaneous
Noise
Noise level
Noise reduction
Noise robustness
phase asynchrony
robust automatic speech recognition (ASR)
Signal and communications theory
Signal processing
Signal, noise
Spectra
Speech processing
Speech recognition
Statistics
Studies
Suppressors
Telecommunications and information theory
title Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T14%3A08%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pasca&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Robust%20Speech%20Recognition%20Using%20a%20Cepstral%20Minimum-Mean-Square-Error-Motivated%20Noise%20Suppressor&rft.jtitle=IEEE%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Dong%20Yu&rft.date=2008-07-01&rft.volume=16&rft.issue=5&rft.spage=1061&rft.epage=1070&rft.pages=1061-1070&rft.issn=1558-7916&rft.eissn=1558-7924&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASL.2008.921761&rft_dat=%3Cproquest_pasca%3E2568776931%3C/proquest_pasca%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c383t-bcfca31393b240174320f3a841c04cd01cc3a2c1e03a1e692a5e0bfc2f5dcbd83%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=917409045&rft_id=info:pmid/&rft_ieee_id=4497834&rfr_iscdi=true