Loading…

Transductive nonnegative matrix factorization for semi-supervised high-performance speech separation

Regarding the non-negativity property of the magnitude spectrogram of speech signals, nonnegative matrix factorization (NMF) has obtained promising performance for speech separation by independently learning a dictionary on the speech signals of each known speaker. However, traditional NM-F fails to...

Full description

Saved in:
Bibliographic Details
Main Authors: Naiyang Guan, Long Lan, Dacheng Tao, Zhigang Luo, Xuejun Yang
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 2538
container_issue
container_start_page 2534
container_title
container_volume
creator Naiyang Guan
Long Lan
Dacheng Tao
Zhigang Luo
Xuejun Yang
description Regarding the non-negativity property of the magnitude spectrogram of speech signals, nonnegative matrix factorization (NMF) has obtained promising performance for speech separation by independently learning a dictionary on the speech signals of each known speaker. However, traditional NM-F fails to represent the mixture signals accurately because the dictionaries for speakers are learned in the absence of mixture signals. In this paper, we propose a new transductive NMF algorithm (TNMF) to jointly learn a dictionary on both speech signals of each speaker and the mixture signals to be separated. Since TNMF learns a more descriptive dictionary by encoding the mixture signals than that learned by NMF, it significantly boosts the separation performance. Experiments results on a popular TIMIT dataset show that the proposed TNMF-based methods outperform traditional NMF-based methods for separating the monophonic mixtures of speech signals of known speakers.
doi_str_mv 10.1109/ICASSP.2014.6854057
format conference_proceeding
fullrecord <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_6854057</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6854057</ieee_id><sourcerecordid>6854057</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-64b12d87bd9f80c90c28dd5a7abe7e35cf088e510b889f994b11cc7331d376e03</originalsourceid><addsrcrecordid>eNotUF1LAzEQjKLgWfsL-nJ_IHVzSS7JoxS1QkGhFXwruWSvjXgfJNei_nqP2qfdmZ0ZmCVkxmDOGJj7l8XDev02L4CJeamlAKkuyNQozYQyptCGi0uSFVwZygx8XJGMyQJoyYS5IbcpfQKAVkJnxG-ibZM_uCEcMW-7tsWdPe2NHWL4zmvrhi6G35Hs2rzuYp6wCTQdeozHkNDn-7Db0xGNt8a2DvPUI7r9qOttPNnuyHVtvxJOz3NC3p8eN4slXb0-j1VWNDAlB1qKihVeq8qbWoMz4ArtvbTKVqiQS1eD1igZVFqb2phRzpxTnDPPVYnAJ2T2nxsQcdvH0Nj4sz0_iP8B0wpbhA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Transductive nonnegative matrix factorization for semi-supervised high-performance speech separation</title><source>IEEE Xplore All Conference Series</source><creator>Naiyang Guan ; Long Lan ; Dacheng Tao ; Zhigang Luo ; Xuejun Yang</creator><creatorcontrib>Naiyang Guan ; Long Lan ; Dacheng Tao ; Zhigang Luo ; Xuejun Yang</creatorcontrib><description>Regarding the non-negativity property of the magnitude spectrogram of speech signals, nonnegative matrix factorization (NMF) has obtained promising performance for speech separation by independently learning a dictionary on the speech signals of each known speaker. However, traditional NM-F fails to represent the mixture signals accurately because the dictionaries for speakers are learned in the absence of mixture signals. In this paper, we propose a new transductive NMF algorithm (TNMF) to jointly learn a dictionary on both speech signals of each speaker and the mixture signals to be separated. Since TNMF learns a more descriptive dictionary by encoding the mixture signals than that learned by NMF, it significantly boosts the separation performance. Experiments results on a popular TIMIT dataset show that the proposed TNMF-based methods outperform traditional NMF-based methods for separating the monophonic mixtures of speech signals of known speakers.</description><identifier>ISSN: 1520-6149</identifier><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 9781479928934</identifier><identifier>EISBN: 1479928933</identifier><identifier>DOI: 10.1109/ICASSP.2014.6854057</identifier><language>eng</language><publisher>IEEE</publisher><subject>Dictionaries ; Nonnegative matrix factorization ; Silicon ; Spectrogram ; Speech ; Speech processing ; speech separation ; Time-domain analysis ; Training ; transductive learning</subject><ispartof>2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, p.2534-2538</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6854057$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27923,54553,54930</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6854057$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Naiyang Guan</creatorcontrib><creatorcontrib>Long Lan</creatorcontrib><creatorcontrib>Dacheng Tao</creatorcontrib><creatorcontrib>Zhigang Luo</creatorcontrib><creatorcontrib>Xuejun Yang</creatorcontrib><title>Transductive nonnegative matrix factorization for semi-supervised high-performance speech separation</title><title>2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</title><addtitle>ICASSP</addtitle><description>Regarding the non-negativity property of the magnitude spectrogram of speech signals, nonnegative matrix factorization (NMF) has obtained promising performance for speech separation by independently learning a dictionary on the speech signals of each known speaker. However, traditional NM-F fails to represent the mixture signals accurately because the dictionaries for speakers are learned in the absence of mixture signals. In this paper, we propose a new transductive NMF algorithm (TNMF) to jointly learn a dictionary on both speech signals of each speaker and the mixture signals to be separated. Since TNMF learns a more descriptive dictionary by encoding the mixture signals than that learned by NMF, it significantly boosts the separation performance. Experiments results on a popular TIMIT dataset show that the proposed TNMF-based methods outperform traditional NMF-based methods for separating the monophonic mixtures of speech signals of known speakers.</description><subject>Dictionaries</subject><subject>Nonnegative matrix factorization</subject><subject>Silicon</subject><subject>Spectrogram</subject><subject>Speech</subject><subject>Speech processing</subject><subject>speech separation</subject><subject>Time-domain analysis</subject><subject>Training</subject><subject>transductive learning</subject><issn>1520-6149</issn><issn>2379-190X</issn><isbn>9781479928934</isbn><isbn>1479928933</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2014</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotUF1LAzEQjKLgWfsL-nJ_IHVzSS7JoxS1QkGhFXwruWSvjXgfJNei_nqP2qfdmZ0ZmCVkxmDOGJj7l8XDev02L4CJeamlAKkuyNQozYQyptCGi0uSFVwZygx8XJGMyQJoyYS5IbcpfQKAVkJnxG-ibZM_uCEcMW-7tsWdPe2NHWL4zmvrhi6G35Hs2rzuYp6wCTQdeozHkNDn-7Db0xGNt8a2DvPUI7r9qOttPNnuyHVtvxJOz3NC3p8eN4slXb0-j1VWNDAlB1qKihVeq8qbWoMz4ArtvbTKVqiQS1eD1igZVFqb2phRzpxTnDPPVYnAJ2T2nxsQcdvH0Nj4sz0_iP8B0wpbhA</recordid><startdate>201405</startdate><enddate>201405</enddate><creator>Naiyang Guan</creator><creator>Long Lan</creator><creator>Dacheng Tao</creator><creator>Zhigang Luo</creator><creator>Xuejun Yang</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201405</creationdate><title>Transductive nonnegative matrix factorization for semi-supervised high-performance speech separation</title><author>Naiyang Guan ; Long Lan ; Dacheng Tao ; Zhigang Luo ; Xuejun Yang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-64b12d87bd9f80c90c28dd5a7abe7e35cf088e510b889f994b11cc7331d376e03</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Dictionaries</topic><topic>Nonnegative matrix factorization</topic><topic>Silicon</topic><topic>Spectrogram</topic><topic>Speech</topic><topic>Speech processing</topic><topic>speech separation</topic><topic>Time-domain analysis</topic><topic>Training</topic><topic>transductive learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Naiyang Guan</creatorcontrib><creatorcontrib>Long Lan</creatorcontrib><creatorcontrib>Dacheng Tao</creatorcontrib><creatorcontrib>Zhigang Luo</creatorcontrib><creatorcontrib>Xuejun Yang</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Naiyang Guan</au><au>Long Lan</au><au>Dacheng Tao</au><au>Zhigang Luo</au><au>Xuejun Yang</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Transductive nonnegative matrix factorization for semi-supervised high-performance speech separation</atitle><btitle>2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</btitle><stitle>ICASSP</stitle><date>2014-05</date><risdate>2014</risdate><spage>2534</spage><epage>2538</epage><pages>2534-2538</pages><issn>1520-6149</issn><eissn>2379-190X</eissn><eisbn>9781479928934</eisbn><eisbn>1479928933</eisbn><abstract>Regarding the non-negativity property of the magnitude spectrogram of speech signals, nonnegative matrix factorization (NMF) has obtained promising performance for speech separation by independently learning a dictionary on the speech signals of each known speaker. However, traditional NM-F fails to represent the mixture signals accurately because the dictionaries for speakers are learned in the absence of mixture signals. In this paper, we propose a new transductive NMF algorithm (TNMF) to jointly learn a dictionary on both speech signals of each speaker and the mixture signals to be separated. Since TNMF learns a more descriptive dictionary by encoding the mixture signals than that learned by NMF, it significantly boosts the separation performance. Experiments results on a popular TIMIT dataset show that the proposed TNMF-based methods outperform traditional NMF-based methods for separating the monophonic mixtures of speech signals of known speakers.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2014.6854057</doi><tpages>5</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1520-6149
ispartof 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014, p.2534-2538
issn 1520-6149
2379-190X
language eng
recordid cdi_ieee_primary_6854057
source IEEE Xplore All Conference Series
subjects Dictionaries
Nonnegative matrix factorization
Silicon
Spectrogram
Speech
Speech processing
speech separation
Time-domain analysis
Training
transductive learning
title Transductive nonnegative matrix factorization for semi-supervised high-performance speech separation
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T22%3A54%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Transductive%20nonnegative%20matrix%20factorization%20for%20semi-supervised%20high-performance%20speech%20separation&rft.btitle=2014%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing%20(ICASSP)&rft.au=Naiyang%20Guan&rft.date=2014-05&rft.spage=2534&rft.epage=2538&rft.pages=2534-2538&rft.issn=1520-6149&rft.eissn=2379-190X&rft_id=info:doi/10.1109/ICASSP.2014.6854057&rft.eisbn=9781479928934&rft.eisbn_list=1479928933&rft_dat=%3Cieee_CHZPO%3E6854057%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i175t-64b12d87bd9f80c90c28dd5a7abe7e35cf088e510b889f994b11cc7331d376e03%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6854057&rfr_iscdi=true