Loading…
Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor
We present an efficient and effective nonlinear feature-domain noise suppression algorithm, motivated by the minimum-mean-square-error (MMSE) optimization criterion, for noise-robust speech recognition. Distinguishing from the log-MMSE spectral amplitude noise suppressor proposed by Ephraim and Mala...
Saved in:
Published in: | IEEE transactions on audio, speech, and language processing speech, and language processing, 2008-07, Vol.16 (5), p.1061-1070 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c383t-bcfca31393b240174320f3a841c04cd01cc3a2c1e03a1e692a5e0bfc2f5dcbd83 |
---|---|
cites | cdi_FETCH-LOGICAL-c383t-bcfca31393b240174320f3a841c04cd01cc3a2c1e03a1e692a5e0bfc2f5dcbd83 |
container_end_page | 1070 |
container_issue | 5 |
container_start_page | 1061 |
container_title | IEEE transactions on audio, speech, and language processing |
container_volume | 16 |
creator | Dong Yu Li Deng Droppo, J. Jian Wu Yifan Gong Acero, A. |
description | We present an efficient and effective nonlinear feature-domain noise suppression algorithm, motivated by the minimum-mean-square-error (MMSE) optimization criterion, for noise-robust speech recognition. Distinguishing from the log-MMSE spectral amplitude noise suppressor proposed by Ephraim and Malah (E&M), our new algorithm is aimed to minimize the error expressed explicitly for the Mel-frequency cepstra instead of discrete Fourier transform (DFT) spectra, and it operates on the Mel-frequency filter bank's output. As a consequence, the statistics used to estimate the suppression factor become vastly different from those used in the E&M log-MMSE suppressor. Our algorithm is significantly more efficient than the E&M's log-MMSE suppressor since the number of the channels in the Mel-frequency filter bank is much smaller (23 in our case) than the number of bins (256) in DFT. We have conducted extensive speech recognition experiments on the standard Aurora-3 task. The experimental results demonstrate a reduction of the recognition word error rate by 48% over the standard ICSLP02 baseline, 26% over the cepstral mean normalization baseline, and 13% over the popular E&M's log-MMSE noise suppressor. The experiments also show that our new algorithm performs slightly better than the ETSI advanced front end (AFE) on the well-matched and mid-mismatched settings, and has 8% and 10% fewer errors than our earlier SPLICE (stereo-based piecewise linear compensation for environments) system on these settings, respectively. |
doi_str_mv | 10.1109/TASL.2008.921761 |
format | article |
fullrecord | <record><control><sourceid>proquest_pasca</sourceid><recordid>TN_cdi_pascalfrancis_primary_20499564</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4497834</ieee_id><sourcerecordid>2568776931</sourcerecordid><originalsourceid>FETCH-LOGICAL-c383t-bcfca31393b240174320f3a841c04cd01cc3a2c1e03a1e692a5e0bfc2f5dcbd83</originalsourceid><addsrcrecordid>eNp9kctLw0AQh4Mo-LwLXoKgnlJnn8kepdQHtAptPS-b7URX0mzcTQT_e1MqPXjwNAPzzY8ZviQ5JzAiBNTt8m4xHVGAYqQoySXZS46IEEWWK8r3dz2Rh8lxjB8AnElOjhI992Ufu3TRItr3dI7WvzWuc75JX6Nr3lKTjrGNXTB1OnONW_frbIamyRafvQmYTULwIZv5zn2ZDlfps3cR00XftgFj9OE0OahMHfHst54kr_eT5fgxm748PI3vppllBeuy0lbWMMIUKykHknNGoWKm4MQCtysg1jJDLUFghqBU1AiEsrK0Eitbrgp2ktxsc9vgP3uMnV67aLGuTYO-j7rIBXAppBjI639JJplgBOQAXv4BP3wfmuELrYYLQQHfpMEWssHHGLDSbXBrE741Ab0Rozdi9EaM3ooZVq5-c020pq6CaayLuz0KXCkh-cBdbDmHiLsx5yovGGc__nSWug</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>917409045</pqid></control><display><type>article</type><title>Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Dong Yu ; Li Deng ; Droppo, J. ; Jian Wu ; Yifan Gong ; Acero, A.</creator><creatorcontrib>Dong Yu ; Li Deng ; Droppo, J. ; Jian Wu ; Yifan Gong ; Acero, A.</creatorcontrib><description>We present an efficient and effective nonlinear feature-domain noise suppression algorithm, motivated by the minimum-mean-square-error (MMSE) optimization criterion, for noise-robust speech recognition. Distinguishing from the log-MMSE spectral amplitude noise suppressor proposed by Ephraim and Malah (E&M), our new algorithm is aimed to minimize the error expressed explicitly for the Mel-frequency cepstra instead of discrete Fourier transform (DFT) spectra, and it operates on the Mel-frequency filter bank's output. As a consequence, the statistics used to estimate the suppression factor become vastly different from those used in the E&M log-MMSE suppressor. Our algorithm is significantly more efficient than the E&M's log-MMSE suppressor since the number of the channels in the Mel-frequency filter bank is much smaller (23 in our case) than the number of bins (256) in DFT. We have conducted extensive speech recognition experiments on the standard Aurora-3 task. The experimental results demonstrate a reduction of the recognition word error rate by 48% over the standard ICSLP02 baseline, 26% over the cepstral mean normalization baseline, and 13% over the popular E&M's log-MMSE noise suppressor. The experiments also show that our new algorithm performs slightly better than the ETSI advanced front end (AFE) on the well-matched and mid-mismatched settings, and has 8% and 10% fewer errors than our earlier SPLICE (stereo-based piecewise linear compensation for environments) system on these settings, respectively.</description><identifier>ISSN: 1558-7916</identifier><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 1558-7924</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASL.2008.921761</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway, NJ: IEEE</publisher><subject>Algorithms ; Applied sciences ; Cepstral analysis ; Channels ; Detection, estimation, filtering, equalization, prediction ; Discrete Fourier transforms ; Error analysis ; Errors ; Exact sciences and technology ; Filter bank ; Fourier transforms ; Information, signal and communications theory ; Mel frequency cepstral coefficient ; Mel-frequency cepstral coefficient (MFCC) ; minimum-mean-square-error (MMSE) estimate ; Miscellaneous ; Noise ; Noise level ; Noise reduction ; Noise robustness ; phase asynchrony ; robust automatic speech recognition (ASR) ; Signal and communications theory ; Signal processing ; Signal, noise ; Spectra ; Speech processing ; Speech recognition ; Statistics ; Studies ; Suppressors ; Telecommunications and information theory</subject><ispartof>IEEE transactions on audio, speech, and language processing, 2008-07, Vol.16 (5), p.1061-1070</ispartof><rights>2008 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2008</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c383t-bcfca31393b240174320f3a841c04cd01cc3a2c1e03a1e692a5e0bfc2f5dcbd83</citedby><cites>FETCH-LOGICAL-c383t-bcfca31393b240174320f3a841c04cd01cc3a2c1e03a1e692a5e0bfc2f5dcbd83</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4497834$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=20499564$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Dong Yu</creatorcontrib><creatorcontrib>Li Deng</creatorcontrib><creatorcontrib>Droppo, J.</creatorcontrib><creatorcontrib>Jian Wu</creatorcontrib><creatorcontrib>Yifan Gong</creatorcontrib><creatorcontrib>Acero, A.</creatorcontrib><title>Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor</title><title>IEEE transactions on audio, speech, and language processing</title><addtitle>TASL</addtitle><description>We present an efficient and effective nonlinear feature-domain noise suppression algorithm, motivated by the minimum-mean-square-error (MMSE) optimization criterion, for noise-robust speech recognition. Distinguishing from the log-MMSE spectral amplitude noise suppressor proposed by Ephraim and Malah (E&M), our new algorithm is aimed to minimize the error expressed explicitly for the Mel-frequency cepstra instead of discrete Fourier transform (DFT) spectra, and it operates on the Mel-frequency filter bank's output. As a consequence, the statistics used to estimate the suppression factor become vastly different from those used in the E&M log-MMSE suppressor. Our algorithm is significantly more efficient than the E&M's log-MMSE suppressor since the number of the channels in the Mel-frequency filter bank is much smaller (23 in our case) than the number of bins (256) in DFT. We have conducted extensive speech recognition experiments on the standard Aurora-3 task. The experimental results demonstrate a reduction of the recognition word error rate by 48% over the standard ICSLP02 baseline, 26% over the cepstral mean normalization baseline, and 13% over the popular E&M's log-MMSE noise suppressor. The experiments also show that our new algorithm performs slightly better than the ETSI advanced front end (AFE) on the well-matched and mid-mismatched settings, and has 8% and 10% fewer errors than our earlier SPLICE (stereo-based piecewise linear compensation for environments) system on these settings, respectively.</description><subject>Algorithms</subject><subject>Applied sciences</subject><subject>Cepstral analysis</subject><subject>Channels</subject><subject>Detection, estimation, filtering, equalization, prediction</subject><subject>Discrete Fourier transforms</subject><subject>Error analysis</subject><subject>Errors</subject><subject>Exact sciences and technology</subject><subject>Filter bank</subject><subject>Fourier transforms</subject><subject>Information, signal and communications theory</subject><subject>Mel frequency cepstral coefficient</subject><subject>Mel-frequency cepstral coefficient (MFCC)</subject><subject>minimum-mean-square-error (MMSE) estimate</subject><subject>Miscellaneous</subject><subject>Noise</subject><subject>Noise level</subject><subject>Noise reduction</subject><subject>Noise robustness</subject><subject>phase asynchrony</subject><subject>robust automatic speech recognition (ASR)</subject><subject>Signal and communications theory</subject><subject>Signal processing</subject><subject>Signal, noise</subject><subject>Spectra</subject><subject>Speech processing</subject><subject>Speech recognition</subject><subject>Statistics</subject><subject>Studies</subject><subject>Suppressors</subject><subject>Telecommunications and information theory</subject><issn>1558-7916</issn><issn>2329-9290</issn><issn>1558-7924</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2008</creationdate><recordtype>article</recordtype><recordid>eNp9kctLw0AQh4Mo-LwLXoKgnlJnn8kepdQHtAptPS-b7URX0mzcTQT_e1MqPXjwNAPzzY8ZviQ5JzAiBNTt8m4xHVGAYqQoySXZS46IEEWWK8r3dz2Rh8lxjB8AnElOjhI992Ufu3TRItr3dI7WvzWuc75JX6Nr3lKTjrGNXTB1OnONW_frbIamyRafvQmYTULwIZv5zn2ZDlfps3cR00XftgFj9OE0OahMHfHst54kr_eT5fgxm748PI3vppllBeuy0lbWMMIUKykHknNGoWKm4MQCtysg1jJDLUFghqBU1AiEsrK0Eitbrgp2ktxsc9vgP3uMnV67aLGuTYO-j7rIBXAppBjI639JJplgBOQAXv4BP3wfmuELrYYLQQHfpMEWssHHGLDSbXBrE741Ab0Rozdi9EaM3ooZVq5-c020pq6CaayLuz0KXCkh-cBdbDmHiLsx5yovGGc__nSWug</recordid><startdate>20080701</startdate><enddate>20080701</enddate><creator>Dong Yu</creator><creator>Li Deng</creator><creator>Droppo, J.</creator><creator>Jian Wu</creator><creator>Yifan Gong</creator><creator>Acero, A.</creator><general>IEEE</general><general>Institute of Electrical and Electronics Engineers</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20080701</creationdate><title>Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor</title><author>Dong Yu ; Li Deng ; Droppo, J. ; Jian Wu ; Yifan Gong ; Acero, A.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c383t-bcfca31393b240174320f3a841c04cd01cc3a2c1e03a1e692a5e0bfc2f5dcbd83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Algorithms</topic><topic>Applied sciences</topic><topic>Cepstral analysis</topic><topic>Channels</topic><topic>Detection, estimation, filtering, equalization, prediction</topic><topic>Discrete Fourier transforms</topic><topic>Error analysis</topic><topic>Errors</topic><topic>Exact sciences and technology</topic><topic>Filter bank</topic><topic>Fourier transforms</topic><topic>Information, signal and communications theory</topic><topic>Mel frequency cepstral coefficient</topic><topic>Mel-frequency cepstral coefficient (MFCC)</topic><topic>minimum-mean-square-error (MMSE) estimate</topic><topic>Miscellaneous</topic><topic>Noise</topic><topic>Noise level</topic><topic>Noise reduction</topic><topic>Noise robustness</topic><topic>phase asynchrony</topic><topic>robust automatic speech recognition (ASR)</topic><topic>Signal and communications theory</topic><topic>Signal processing</topic><topic>Signal, noise</topic><topic>Spectra</topic><topic>Speech processing</topic><topic>Speech recognition</topic><topic>Statistics</topic><topic>Studies</topic><topic>Suppressors</topic><topic>Telecommunications and information theory</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dong Yu</creatorcontrib><creatorcontrib>Li Deng</creatorcontrib><creatorcontrib>Droppo, J.</creatorcontrib><creatorcontrib>Jian Wu</creatorcontrib><creatorcontrib>Yifan Gong</creatorcontrib><creatorcontrib>Acero, A.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dong Yu</au><au>Li Deng</au><au>Droppo, J.</au><au>Jian Wu</au><au>Yifan Gong</au><au>Acero, A.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor</atitle><jtitle>IEEE transactions on audio, speech, and language processing</jtitle><stitle>TASL</stitle><date>2008-07-01</date><risdate>2008</risdate><volume>16</volume><issue>5</issue><spage>1061</spage><epage>1070</epage><pages>1061-1070</pages><issn>1558-7916</issn><issn>2329-9290</issn><eissn>1558-7924</eissn><eissn>2329-9304</eissn><coden>ITASD8</coden><abstract>We present an efficient and effective nonlinear feature-domain noise suppression algorithm, motivated by the minimum-mean-square-error (MMSE) optimization criterion, for noise-robust speech recognition. Distinguishing from the log-MMSE spectral amplitude noise suppressor proposed by Ephraim and Malah (E&M), our new algorithm is aimed to minimize the error expressed explicitly for the Mel-frequency cepstra instead of discrete Fourier transform (DFT) spectra, and it operates on the Mel-frequency filter bank's output. As a consequence, the statistics used to estimate the suppression factor become vastly different from those used in the E&M log-MMSE suppressor. Our algorithm is significantly more efficient than the E&M's log-MMSE suppressor since the number of the channels in the Mel-frequency filter bank is much smaller (23 in our case) than the number of bins (256) in DFT. We have conducted extensive speech recognition experiments on the standard Aurora-3 task. The experimental results demonstrate a reduction of the recognition word error rate by 48% over the standard ICSLP02 baseline, 26% over the cepstral mean normalization baseline, and 13% over the popular E&M's log-MMSE noise suppressor. The experiments also show that our new algorithm performs slightly better than the ETSI advanced front end (AFE) on the well-matched and mid-mismatched settings, and has 8% and 10% fewer errors than our earlier SPLICE (stereo-based piecewise linear compensation for environments) system on these settings, respectively.</abstract><cop>Piscataway, NJ</cop><pub>IEEE</pub><doi>10.1109/TASL.2008.921761</doi><tpages>10</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1558-7916 |
ispartof | IEEE transactions on audio, speech, and language processing, 2008-07, Vol.16 (5), p.1061-1070 |
issn | 1558-7916 2329-9290 1558-7924 2329-9304 |
language | eng |
recordid | cdi_pascalfrancis_primary_20499564 |
source | IEEE Electronic Library (IEL) Journals |
subjects | Algorithms Applied sciences Cepstral analysis Channels Detection, estimation, filtering, equalization, prediction Discrete Fourier transforms Error analysis Errors Exact sciences and technology Filter bank Fourier transforms Information, signal and communications theory Mel frequency cepstral coefficient Mel-frequency cepstral coefficient (MFCC) minimum-mean-square-error (MMSE) estimate Miscellaneous Noise Noise level Noise reduction Noise robustness phase asynchrony robust automatic speech recognition (ASR) Signal and communications theory Signal processing Signal, noise Spectra Speech processing Speech recognition Statistics Studies Suppressors Telecommunications and information theory |
title | Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T14%3A08%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pasca&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Robust%20Speech%20Recognition%20Using%20a%20Cepstral%20Minimum-Mean-Square-Error-Motivated%20Noise%20Suppressor&rft.jtitle=IEEE%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Dong%20Yu&rft.date=2008-07-01&rft.volume=16&rft.issue=5&rft.spage=1061&rft.epage=1070&rft.pages=1061-1070&rft.issn=1558-7916&rft.eissn=1558-7924&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASL.2008.921761&rft_dat=%3Cproquest_pasca%3E2568776931%3C/proquest_pasca%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c383t-bcfca31393b240174320f3a841c04cd01cc3a2c1e03a1e692a5e0bfc2f5dcbd83%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=917409045&rft_id=info:pmid/&rft_ieee_id=4497834&rfr_iscdi=true |