Loading…
Adaptive selection of local and non-local attention mechanisms for speech enhancement
In speech enhancement tasks, local and non-local attention mechanisms have been significantly improved and well studied. However, a natural speech signal contains many dynamic and fast-changing acoustic features, and focusing on one type of attention mechanism (local or non-local) cannot precisely c...
Saved in:
Published in: | Neural networks 2024-06, Vol.174, p.106236-106236, Article 106236 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c311t-13c09ff29a75ce24e8a9ee8b567e3fcf8b0aa695b4c5e81f03eb2ff836336d933 |
container_end_page | 106236 |
container_issue | |
container_start_page | 106236 |
container_title | Neural networks |
container_volume | 174 |
creator | Xu, Xinmeng Tu, Weiping Yang, Yuhong |
description | In speech enhancement tasks, local and non-local attention mechanisms have been significantly improved and well studied. However, a natural speech signal contains many dynamic and fast-changing acoustic features, and focusing on one type of attention mechanism (local or non-local) cannot precisely capture the most discriminative information for estimating target speech from background interference. To address this issue, we introduce an adaptive selection network to dynamically select an appropriate route that determines whether to use the attention mechanisms and which to use for the task. We train the adaptive selection network using reinforcement learning with a developed difficulty-adjusted reward that is related to the performance, complexity, and difficulty of target speech estimation from the noisy mixtures. Consequently, we propose an Attention Selection Speech Enhancement Network (ASSENet) with the innovative dynamic block that consists of an adaptive selection network and a local and non-local attention based speech enhancement network. In particular, the ASSENet incorporates both local and non-local attention and develops the attention mechanism selection technique to explore the appropriate route of local and non-local attention mechanisms for speech enhancement tasks. The results show that our method achieves comparable and superior performance to existing approaches with attractive computational costs. |
doi_str_mv | 10.1016/j.neunet.2024.106236 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2974004890</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0893608024001606</els_id><sourcerecordid>2974004890</sourcerecordid><originalsourceid>FETCH-LOGICAL-c311t-13c09ff29a75ce24e8a9ee8b567e3fcf8b0aa695b4c5e81f03eb2ff836336d933</originalsourceid><addsrcrecordid>eNp9kE1LxDAQhoMoun78A5EevXSdJG2aXAQRv2DBi55Dmk4wS5usTVfw3xvt6tHTMC_PzDAPIecUlhSouFovA24DTksGrMqRYFzskQWVjSpZI9k-WYBUvBQg4Ygcp7QGACErfkiOuKwzR2FBXm86s5n8BxYJe7STj6GIruijNX1hQleEGMpdN00YfoAB7ZsJPg2pcHEs0gZzUGDIocUhQ6fkwJk-4dmunpDX-7uX28dy9fzwdHuzKi2ndCopt6CcY8o0tUVWoTQKUba1aJA762QLxghVt5WtUVIHHFvmnOSCc9Epzk_I5bx3M8b3LaZJDz5Z7HsTMG6TZqqpACqpIKPVjNoxpjSi05vRD2b81BT0t1C91rNQ_S1Uz0Lz2MXuwrYdsPsb-jWYgesZwPznh8dRJ-sxe-j8mH3qLvr_L3wBaZ-J4w</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2974004890</pqid></control><display><type>article</type><title>Adaptive selection of local and non-local attention mechanisms for speech enhancement</title><source>Elsevier</source><creator>Xu, Xinmeng ; Tu, Weiping ; Yang, Yuhong</creator><creatorcontrib>Xu, Xinmeng ; Tu, Weiping ; Yang, Yuhong</creatorcontrib><description>In speech enhancement tasks, local and non-local attention mechanisms have been significantly improved and well studied. However, a natural speech signal contains many dynamic and fast-changing acoustic features, and focusing on one type of attention mechanism (local or non-local) cannot precisely capture the most discriminative information for estimating target speech from background interference. To address this issue, we introduce an adaptive selection network to dynamically select an appropriate route that determines whether to use the attention mechanisms and which to use for the task. We train the adaptive selection network using reinforcement learning with a developed difficulty-adjusted reward that is related to the performance, complexity, and difficulty of target speech estimation from the noisy mixtures. Consequently, we propose an Attention Selection Speech Enhancement Network (ASSENet) with the innovative dynamic block that consists of an adaptive selection network and a local and non-local attention based speech enhancement network. In particular, the ASSENet incorporates both local and non-local attention and develops the attention mechanism selection technique to explore the appropriate route of local and non-local attention mechanisms for speech enhancement tasks. The results show that our method achieves comparable and superior performance to existing approaches with attractive computational costs.</description><identifier>ISSN: 0893-6080</identifier><identifier>EISSN: 1879-2782</identifier><identifier>DOI: 10.1016/j.neunet.2024.106236</identifier><identifier>PMID: 38518710</identifier><language>eng</language><publisher>United States: Elsevier Ltd</publisher><subject>Adaptive selection ; Difficulty-adjusted reward ; Local and non-local attention ; Reinforcement learning ; Speech enhancement</subject><ispartof>Neural networks, 2024-06, Vol.174, p.106236-106236, Article 106236</ispartof><rights>2024 Elsevier Ltd</rights><rights>Copyright © 2024 Elsevier Ltd. All rights reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c311t-13c09ff29a75ce24e8a9ee8b567e3fcf8b0aa695b4c5e81f03eb2ff836336d933</cites><orcidid>0000-0002-6933-3298 ; 0000-0003-3001-7957</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27903,27904</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38518710$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Xu, Xinmeng</creatorcontrib><creatorcontrib>Tu, Weiping</creatorcontrib><creatorcontrib>Yang, Yuhong</creatorcontrib><title>Adaptive selection of local and non-local attention mechanisms for speech enhancement</title><title>Neural networks</title><addtitle>Neural Netw</addtitle><description>In speech enhancement tasks, local and non-local attention mechanisms have been significantly improved and well studied. However, a natural speech signal contains many dynamic and fast-changing acoustic features, and focusing on one type of attention mechanism (local or non-local) cannot precisely capture the most discriminative information for estimating target speech from background interference. To address this issue, we introduce an adaptive selection network to dynamically select an appropriate route that determines whether to use the attention mechanisms and which to use for the task. We train the adaptive selection network using reinforcement learning with a developed difficulty-adjusted reward that is related to the performance, complexity, and difficulty of target speech estimation from the noisy mixtures. Consequently, we propose an Attention Selection Speech Enhancement Network (ASSENet) with the innovative dynamic block that consists of an adaptive selection network and a local and non-local attention based speech enhancement network. In particular, the ASSENet incorporates both local and non-local attention and develops the attention mechanism selection technique to explore the appropriate route of local and non-local attention mechanisms for speech enhancement tasks. The results show that our method achieves comparable and superior performance to existing approaches with attractive computational costs.</description><subject>Adaptive selection</subject><subject>Difficulty-adjusted reward</subject><subject>Local and non-local attention</subject><subject>Reinforcement learning</subject><subject>Speech enhancement</subject><issn>0893-6080</issn><issn>1879-2782</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LxDAQhoMoun78A5EevXSdJG2aXAQRv2DBi55Dmk4wS5usTVfw3xvt6tHTMC_PzDAPIecUlhSouFovA24DTksGrMqRYFzskQWVjSpZI9k-WYBUvBQg4Ygcp7QGACErfkiOuKwzR2FBXm86s5n8BxYJe7STj6GIruijNX1hQleEGMpdN00YfoAB7ZsJPg2pcHEs0gZzUGDIocUhQ6fkwJk-4dmunpDX-7uX28dy9fzwdHuzKi2ndCopt6CcY8o0tUVWoTQKUba1aJA762QLxghVt5WtUVIHHFvmnOSCc9Epzk_I5bx3M8b3LaZJDz5Z7HsTMG6TZqqpACqpIKPVjNoxpjSi05vRD2b81BT0t1C91rNQ_S1Uz0Lz2MXuwrYdsPsb-jWYgesZwPznh8dRJ-sxe-j8mH3qLvr_L3wBaZ-J4w</recordid><startdate>20240601</startdate><enddate>20240601</enddate><creator>Xu, Xinmeng</creator><creator>Tu, Weiping</creator><creator>Yang, Yuhong</creator><general>Elsevier Ltd</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-6933-3298</orcidid><orcidid>https://orcid.org/0000-0003-3001-7957</orcidid></search><sort><creationdate>20240601</creationdate><title>Adaptive selection of local and non-local attention mechanisms for speech enhancement</title><author>Xu, Xinmeng ; Tu, Weiping ; Yang, Yuhong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c311t-13c09ff29a75ce24e8a9ee8b567e3fcf8b0aa695b4c5e81f03eb2ff836336d933</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Adaptive selection</topic><topic>Difficulty-adjusted reward</topic><topic>Local and non-local attention</topic><topic>Reinforcement learning</topic><topic>Speech enhancement</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Xu, Xinmeng</creatorcontrib><creatorcontrib>Tu, Weiping</creatorcontrib><creatorcontrib>Yang, Yuhong</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Neural networks</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Xu, Xinmeng</au><au>Tu, Weiping</au><au>Yang, Yuhong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Adaptive selection of local and non-local attention mechanisms for speech enhancement</atitle><jtitle>Neural networks</jtitle><addtitle>Neural Netw</addtitle><date>2024-06-01</date><risdate>2024</risdate><volume>174</volume><spage>106236</spage><epage>106236</epage><pages>106236-106236</pages><artnum>106236</artnum><issn>0893-6080</issn><eissn>1879-2782</eissn><abstract>In speech enhancement tasks, local and non-local attention mechanisms have been significantly improved and well studied. However, a natural speech signal contains many dynamic and fast-changing acoustic features, and focusing on one type of attention mechanism (local or non-local) cannot precisely capture the most discriminative information for estimating target speech from background interference. To address this issue, we introduce an adaptive selection network to dynamically select an appropriate route that determines whether to use the attention mechanisms and which to use for the task. We train the adaptive selection network using reinforcement learning with a developed difficulty-adjusted reward that is related to the performance, complexity, and difficulty of target speech estimation from the noisy mixtures. Consequently, we propose an Attention Selection Speech Enhancement Network (ASSENet) with the innovative dynamic block that consists of an adaptive selection network and a local and non-local attention based speech enhancement network. In particular, the ASSENet incorporates both local and non-local attention and develops the attention mechanism selection technique to explore the appropriate route of local and non-local attention mechanisms for speech enhancement tasks. The results show that our method achieves comparable and superior performance to existing approaches with attractive computational costs.</abstract><cop>United States</cop><pub>Elsevier Ltd</pub><pmid>38518710</pmid><doi>10.1016/j.neunet.2024.106236</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-6933-3298</orcidid><orcidid>https://orcid.org/0000-0003-3001-7957</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0893-6080 |
ispartof | Neural networks, 2024-06, Vol.174, p.106236-106236, Article 106236 |
issn | 0893-6080 1879-2782 |
language | eng |
recordid | cdi_proquest_miscellaneous_2974004890 |
source | Elsevier |
subjects | Adaptive selection Difficulty-adjusted reward Local and non-local attention Reinforcement learning Speech enhancement |
title | Adaptive selection of local and non-local attention mechanisms for speech enhancement |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T02%3A54%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Adaptive%20selection%20of%20local%20and%20non-local%20attention%20mechanisms%20for%20speech%20enhancement&rft.jtitle=Neural%20networks&rft.au=Xu,%20Xinmeng&rft.date=2024-06-01&rft.volume=174&rft.spage=106236&rft.epage=106236&rft.pages=106236-106236&rft.artnum=106236&rft.issn=0893-6080&rft.eissn=1879-2782&rft_id=info:doi/10.1016/j.neunet.2024.106236&rft_dat=%3Cproquest_cross%3E2974004890%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c311t-13c09ff29a75ce24e8a9ee8b567e3fcf8b0aa695b4c5e81f03eb2ff836336d933%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2974004890&rft_id=info:pmid/38518710&rfr_iscdi=true |