Loading…

Multi-microphone simultaneous speakers detection and localization of multi-sources for separation and noise reduction

This paper addresses the challenge of online blind speaker separation in a multi-microphone setting. The linearly constrained minimum variance (LCMV) beamformer is selected as the backbone of the separation algorithm due to its distortionless response and capacity to create a null towards interferin...

Full description

Saved in:
Bibliographic Details
Published in:EURASIP journal on audio, speech, and music processing speech, and music processing, 2024-10, Vol.2024 (1), p.50-15, Article 50
Main Authors: Schwartz, Ayal, Schwartz, Ofer, Chazan, Shlomo E., Gannot, Sharon
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c310t-afabe9be76478dc10ca60fd6baa1d9c8bff052f93bca13fab9bcb9d2413ecb4a3
container_end_page 15
container_issue 1
container_start_page 50
container_title EURASIP journal on audio, speech, and music processing
container_volume 2024
creator Schwartz, Ayal
Schwartz, Ofer
Chazan, Shlomo E.
Gannot, Sharon
description This paper addresses the challenge of online blind speaker separation in a multi-microphone setting. The linearly constrained minimum variance (LCMV) beamformer is selected as the backbone of the separation algorithm due to its distortionless response and capacity to create a null towards interfering sources. A specific instance of the LCMV beamformer that considers acoustic propagation is implemented. In this variant, the relative transfer functions (RTFs) associated with each speaker of interest are utilized as the steering vectors of the beamformer. A control mechanism is devised to ensure robust estimation of the beamformer’s building blocks, comprising speaker activity detectors and direction of arrival (DOA) estimation branches. This control mechanism is implemented as a multi-task deep neural network (DNN). The primary task classifies each time frame based on speaker activity: no active speaker, single active speaker, or multiple active speakers. The secondary task is DOA estimation. It is implemented as a classification task, executed only for frames classified as single-speaker frames by the primary branch. The direction of the active speaker is classified into one of the multiple ranges of angles. These frames are also leveraged to estimate the RTFs using subspace estimation methods. A library of RTFs associated with these DOA ranges is then constructed, facilitating rapid acquisition of new speakers and efficient tracking of existing speakers. The proposed scheme is evaluated in both simulated and real-life recordings, encompassing static and dynamic scenarios. The benefits of the multi-task approach are showcased, and significant improvements are evident, even when the control mechanism is trained with simulated data and tested with real-life data. A comparison between the proposed scheme and the independent low-rank matrix analysis (ILRMA) algorithm reveals significant improvements in static scenarios. Furthermore, the tracking capabilities of the proposed scheme are highlighted in dynamic scenarios.
doi_str_mv 10.1186/s13636-024-00365-3
format article
fullrecord <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_28ecf5174c4446aeba3e9f34d74bd140</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_28ecf5174c4446aeba3e9f34d74bd140</doaj_id><sourcerecordid>3112971680</sourcerecordid><originalsourceid>FETCH-LOGICAL-c310t-afabe9be76478dc10ca60fd6baa1d9c8bff052f93bca13fab9bcb9d2413ecb4a3</originalsourceid><addsrcrecordid>eNp9kc1O3TAQhSMEEreUF2Blqeu0_ouTLBFqKRKom3Ztje0x-JIbp3ayaJ8e36SirFjZOjrn83hOVV0x-pmxTn3JTCihasplTalQTS1Oqh1TXVvLlvPTN_fz6kPOe0ob0Ui-q5aHZZhDfQg2xekpjkhyOBQJRoxLJnlCeMaUicMZ7RziSGB0ZIgWhvAXViF6clghOS7JYiY-JpJxggSvgTGGjCShW1bIx-rMw5Dx8t95Uf369vXnzff6_sft3c31fW0Fo3MNHgz2Blsl285ZRi0o6p0yAMz1tjPe04b7XhgLTBRzb6zpHZdMoDUSxEV1t3FdhL2eUjhA-qMjBL0KMT1qSHOwA2reofUNa6WVUipAAwJ7L6RrpXFM0sL6tLGmFH8vmGe9L_8dy_haMMb7tqz46OKbq-wz54T-9VVG9bEqvVWlS1V6rUqLEhJbKBfz-IjpP_qd1AswY5tw</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3112971680</pqid></control><display><type>article</type><title>Multi-microphone simultaneous speakers detection and localization of multi-sources for separation and noise reduction</title><source>Publicly Available Content Database</source><source>Springer Nature - SpringerLink Journals - Fully Open Access</source><creator>Schwartz, Ayal ; Schwartz, Ofer ; Chazan, Shlomo E. ; Gannot, Sharon</creator><creatorcontrib>Schwartz, Ayal ; Schwartz, Ofer ; Chazan, Shlomo E. ; Gannot, Sharon</creatorcontrib><description>This paper addresses the challenge of online blind speaker separation in a multi-microphone setting. The linearly constrained minimum variance (LCMV) beamformer is selected as the backbone of the separation algorithm due to its distortionless response and capacity to create a null towards interfering sources. A specific instance of the LCMV beamformer that considers acoustic propagation is implemented. In this variant, the relative transfer functions (RTFs) associated with each speaker of interest are utilized as the steering vectors of the beamformer. A control mechanism is devised to ensure robust estimation of the beamformer’s building blocks, comprising speaker activity detectors and direction of arrival (DOA) estimation branches. This control mechanism is implemented as a multi-task deep neural network (DNN). The primary task classifies each time frame based on speaker activity: no active speaker, single active speaker, or multiple active speakers. The secondary task is DOA estimation. It is implemented as a classification task, executed only for frames classified as single-speaker frames by the primary branch. The direction of the active speaker is classified into one of the multiple ranges of angles. These frames are also leveraged to estimate the RTFs using subspace estimation methods. A library of RTFs associated with these DOA ranges is then constructed, facilitating rapid acquisition of new speakers and efficient tracking of existing speakers. The proposed scheme is evaluated in both simulated and real-life recordings, encompassing static and dynamic scenarios. The benefits of the multi-task approach are showcased, and significant improvements are evident, even when the control mechanism is trained with simulated data and tested with real-life data. A comparison between the proposed scheme and the independent low-rank matrix analysis (ILRMA) algorithm reveals significant improvements in static scenarios. Furthermore, the tracking capabilities of the proposed scheme are highlighted in dynamic scenarios.</description><identifier>ISSN: 1687-4722</identifier><identifier>ISSN: 1687-4714</identifier><identifier>EISSN: 1687-4722</identifier><identifier>DOI: 10.1186/s13636-024-00365-3</identifier><language>eng</language><publisher>Cham: Springer International Publishing</publisher><subject>Acoustic propagation ; Acoustics ; Algorithms ; Artificial neural networks ; Beamforming ; Direction of arrival ; DOA estimation ; Engineering ; Engineering Acoustics ; Frames ; LCMV beamforming ; Localization ; Mathematics in Music ; Matrix methods ; Methodology ; Microphones ; Multi-task deep learning ; Relative transfer function estimation ; Robust control ; Separation ; Signal,Image and Speech Processing ; Speech ; Speech activity detection ; Steering ; Tracking ; Transfer functions</subject><ispartof>EURASIP journal on audio, speech, and music processing, 2024-10, Vol.2024 (1), p.50-15, Article 50</ispartof><rights>The Author(s) 2024</rights><rights>The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c310t-afabe9be76478dc10ca60fd6baa1d9c8bff052f93bca13fab9bcb9d2413ecb4a3</cites><orcidid>0000-0002-2885-170X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/3112971680/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/3112971680?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,25753,27924,27925,37012,44590,75126</link.rule.ids></links><search><creatorcontrib>Schwartz, Ayal</creatorcontrib><creatorcontrib>Schwartz, Ofer</creatorcontrib><creatorcontrib>Chazan, Shlomo E.</creatorcontrib><creatorcontrib>Gannot, Sharon</creatorcontrib><title>Multi-microphone simultaneous speakers detection and localization of multi-sources for separation and noise reduction</title><title>EURASIP journal on audio, speech, and music processing</title><addtitle>J AUDIO SPEECH MUSIC PROC</addtitle><description>This paper addresses the challenge of online blind speaker separation in a multi-microphone setting. The linearly constrained minimum variance (LCMV) beamformer is selected as the backbone of the separation algorithm due to its distortionless response and capacity to create a null towards interfering sources. A specific instance of the LCMV beamformer that considers acoustic propagation is implemented. In this variant, the relative transfer functions (RTFs) associated with each speaker of interest are utilized as the steering vectors of the beamformer. A control mechanism is devised to ensure robust estimation of the beamformer’s building blocks, comprising speaker activity detectors and direction of arrival (DOA) estimation branches. This control mechanism is implemented as a multi-task deep neural network (DNN). The primary task classifies each time frame based on speaker activity: no active speaker, single active speaker, or multiple active speakers. The secondary task is DOA estimation. It is implemented as a classification task, executed only for frames classified as single-speaker frames by the primary branch. The direction of the active speaker is classified into one of the multiple ranges of angles. These frames are also leveraged to estimate the RTFs using subspace estimation methods. A library of RTFs associated with these DOA ranges is then constructed, facilitating rapid acquisition of new speakers and efficient tracking of existing speakers. The proposed scheme is evaluated in both simulated and real-life recordings, encompassing static and dynamic scenarios. The benefits of the multi-task approach are showcased, and significant improvements are evident, even when the control mechanism is trained with simulated data and tested with real-life data. A comparison between the proposed scheme and the independent low-rank matrix analysis (ILRMA) algorithm reveals significant improvements in static scenarios. Furthermore, the tracking capabilities of the proposed scheme are highlighted in dynamic scenarios.</description><subject>Acoustic propagation</subject><subject>Acoustics</subject><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>Beamforming</subject><subject>Direction of arrival</subject><subject>DOA estimation</subject><subject>Engineering</subject><subject>Engineering Acoustics</subject><subject>Frames</subject><subject>LCMV beamforming</subject><subject>Localization</subject><subject>Mathematics in Music</subject><subject>Matrix methods</subject><subject>Methodology</subject><subject>Microphones</subject><subject>Multi-task deep learning</subject><subject>Relative transfer function estimation</subject><subject>Robust control</subject><subject>Separation</subject><subject>Signal,Image and Speech Processing</subject><subject>Speech</subject><subject>Speech activity detection</subject><subject>Steering</subject><subject>Tracking</subject><subject>Transfer functions</subject><issn>1687-4722</issn><issn>1687-4714</issn><issn>1687-4722</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNp9kc1O3TAQhSMEEreUF2Blqeu0_ouTLBFqKRKom3Ztje0x-JIbp3ayaJ8e36SirFjZOjrn83hOVV0x-pmxTn3JTCihasplTalQTS1Oqh1TXVvLlvPTN_fz6kPOe0ob0Ui-q5aHZZhDfQg2xekpjkhyOBQJRoxLJnlCeMaUicMZ7RziSGB0ZIgWhvAXViF6clghOS7JYiY-JpJxggSvgTGGjCShW1bIx-rMw5Dx8t95Uf369vXnzff6_sft3c31fW0Fo3MNHgz2Blsl285ZRi0o6p0yAMz1tjPe04b7XhgLTBRzb6zpHZdMoDUSxEV1t3FdhL2eUjhA-qMjBL0KMT1qSHOwA2reofUNa6WVUipAAwJ7L6RrpXFM0sL6tLGmFH8vmGe9L_8dy_haMMb7tqz46OKbq-wz54T-9VVG9bEqvVWlS1V6rUqLEhJbKBfz-IjpP_qd1AswY5tw</recordid><startdate>20241004</startdate><enddate>20241004</enddate><creator>Schwartz, Ayal</creator><creator>Schwartz, Ofer</creator><creator>Chazan, Shlomo E.</creator><creator>Gannot, Sharon</creator><general>Springer International Publishing</general><general>Springer Nature B.V</general><general>SpringerOpen</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-2885-170X</orcidid></search><sort><creationdate>20241004</creationdate><title>Multi-microphone simultaneous speakers detection and localization of multi-sources for separation and noise reduction</title><author>Schwartz, Ayal ; Schwartz, Ofer ; Chazan, Shlomo E. ; Gannot, Sharon</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c310t-afabe9be76478dc10ca60fd6baa1d9c8bff052f93bca13fab9bcb9d2413ecb4a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Acoustic propagation</topic><topic>Acoustics</topic><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>Beamforming</topic><topic>Direction of arrival</topic><topic>DOA estimation</topic><topic>Engineering</topic><topic>Engineering Acoustics</topic><topic>Frames</topic><topic>LCMV beamforming</topic><topic>Localization</topic><topic>Mathematics in Music</topic><topic>Matrix methods</topic><topic>Methodology</topic><topic>Microphones</topic><topic>Multi-task deep learning</topic><topic>Relative transfer function estimation</topic><topic>Robust control</topic><topic>Separation</topic><topic>Signal,Image and Speech Processing</topic><topic>Speech</topic><topic>Speech activity detection</topic><topic>Steering</topic><topic>Tracking</topic><topic>Transfer functions</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Schwartz, Ayal</creatorcontrib><creatorcontrib>Schwartz, Ofer</creatorcontrib><creatorcontrib>Chazan, Shlomo E.</creatorcontrib><creatorcontrib>Gannot, Sharon</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Database‎ (1962 - current)</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest advanced technologies &amp; aerospace journals</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Directory of Open Access Journals</collection><jtitle>EURASIP journal on audio, speech, and music processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Schwartz, Ayal</au><au>Schwartz, Ofer</au><au>Chazan, Shlomo E.</au><au>Gannot, Sharon</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multi-microphone simultaneous speakers detection and localization of multi-sources for separation and noise reduction</atitle><jtitle>EURASIP journal on audio, speech, and music processing</jtitle><stitle>J AUDIO SPEECH MUSIC PROC</stitle><date>2024-10-04</date><risdate>2024</risdate><volume>2024</volume><issue>1</issue><spage>50</spage><epage>15</epage><pages>50-15</pages><artnum>50</artnum><issn>1687-4722</issn><issn>1687-4714</issn><eissn>1687-4722</eissn><abstract>This paper addresses the challenge of online blind speaker separation in a multi-microphone setting. The linearly constrained minimum variance (LCMV) beamformer is selected as the backbone of the separation algorithm due to its distortionless response and capacity to create a null towards interfering sources. A specific instance of the LCMV beamformer that considers acoustic propagation is implemented. In this variant, the relative transfer functions (RTFs) associated with each speaker of interest are utilized as the steering vectors of the beamformer. A control mechanism is devised to ensure robust estimation of the beamformer’s building blocks, comprising speaker activity detectors and direction of arrival (DOA) estimation branches. This control mechanism is implemented as a multi-task deep neural network (DNN). The primary task classifies each time frame based on speaker activity: no active speaker, single active speaker, or multiple active speakers. The secondary task is DOA estimation. It is implemented as a classification task, executed only for frames classified as single-speaker frames by the primary branch. The direction of the active speaker is classified into one of the multiple ranges of angles. These frames are also leveraged to estimate the RTFs using subspace estimation methods. A library of RTFs associated with these DOA ranges is then constructed, facilitating rapid acquisition of new speakers and efficient tracking of existing speakers. The proposed scheme is evaluated in both simulated and real-life recordings, encompassing static and dynamic scenarios. The benefits of the multi-task approach are showcased, and significant improvements are evident, even when the control mechanism is trained with simulated data and tested with real-life data. A comparison between the proposed scheme and the independent low-rank matrix analysis (ILRMA) algorithm reveals significant improvements in static scenarios. Furthermore, the tracking capabilities of the proposed scheme are highlighted in dynamic scenarios.</abstract><cop>Cham</cop><pub>Springer International Publishing</pub><doi>10.1186/s13636-024-00365-3</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0002-2885-170X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1687-4722
ispartof EURASIP journal on audio, speech, and music processing, 2024-10, Vol.2024 (1), p.50-15, Article 50
issn 1687-4722
1687-4714
1687-4722
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_28ecf5174c4446aeba3e9f34d74bd140
source Publicly Available Content Database; Springer Nature - SpringerLink Journals - Fully Open Access
subjects Acoustic propagation
Acoustics
Algorithms
Artificial neural networks
Beamforming
Direction of arrival
DOA estimation
Engineering
Engineering Acoustics
Frames
LCMV beamforming
Localization
Mathematics in Music
Matrix methods
Methodology
Microphones
Multi-task deep learning
Relative transfer function estimation
Robust control
Separation
Signal,Image and Speech Processing
Speech
Speech activity detection
Steering
Tracking
Transfer functions
title Multi-microphone simultaneous speakers detection and localization of multi-sources for separation and noise reduction
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T05%3A23%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multi-microphone%20simultaneous%20speakers%20detection%20and%20localization%20of%20multi-sources%20for%20separation%20and%20noise%20reduction&rft.jtitle=EURASIP%20journal%20on%20audio,%20speech,%20and%20music%20processing&rft.au=Schwartz,%20Ayal&rft.date=2024-10-04&rft.volume=2024&rft.issue=1&rft.spage=50&rft.epage=15&rft.pages=50-15&rft.artnum=50&rft.issn=1687-4722&rft.eissn=1687-4722&rft_id=info:doi/10.1186/s13636-024-00365-3&rft_dat=%3Cproquest_doaj_%3E3112971680%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c310t-afabe9be76478dc10ca60fd6baa1d9c8bff052f93bca13fab9bcb9d2413ecb4a3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3112971680&rft_id=info:pmid/&rfr_iscdi=true