Loading…
Multistream speaker diarization beyond two acoustic feature streams
Speaker diarization for meetings data are recently converging towards multistream systems. The most common complementary features used in combination with MFCC are Time Delay of Arrival (TDOA). Also other features have been proposed although, there are no reported improvements on top of MFCC+TDOA sy...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 4953 |
container_issue | |
container_start_page | 4950 |
container_title | |
container_volume | |
creator | Vijayasenan, D Valente, F Bourlard, H |
description | Speaker diarization for meetings data are recently converging towards multistream systems. The most common complementary features used in combination with MFCC are Time Delay of Arrival (TDOA). Also other features have been proposed although, there are no reported improvements on top of MFCC+TDOA systems. In this work we investigate the combination of other feature sets along with MFCC+TDOA. We discuss issues and problems related to the weighting of four different streams proposing a solution based on a smoothed version of the speaker error. Experiments are presented on NIST RT06 meeting diarization evaluation. Results reveal that the combination of four acoustic feature streams results in a 30% relative improvement with respect to the MFCC+TDOA feature combination. To the authors' best knowledge, this is the first successful attempt to improve the MFCC+TDOA baseline including other feature streams. |
doi_str_mv | 10.1109/ICASSP.2010.5495086 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5495086</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5495086</ieee_id><sourcerecordid>5495086</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-bf51b70029cbfc384af383ebef680ea22548b533319fbeae6a2b15a68b97447b3</originalsourceid><addsrcrecordid>eNpVkMtKA0EURNsXOMZ8QTb9AxP7_VjKoFGIKETBXbg9uQ2tSSZM9yDx6w0kG1cFVZyiKEImnE05Z_7uublfLN6mgh0MrbxmzpyRsbeOK6GUEt6Yc1IJaX3NPfu8-Jdpf0kqrgWrDVf-mtzk_MUYc1a5ijQvw7qkXHqEDc07hG_s6SpBn36hpG5LA-677YqWn45C2w25pJZGhDL0SI9YviVXEdYZxycdkY_Hh_fmqZ6_zg7L53XiVpc6RM2DZUz4NsRWOgVROokBo3EMQQitXNBSSu5jQEADInANxgVvlbJBjsjk2JsQcbnr0wb6_fL0h_wDzl9RXw</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Multistream speaker diarization beyond two acoustic feature streams</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Vijayasenan, D ; Valente, F ; Bourlard, H</creator><creatorcontrib>Vijayasenan, D ; Valente, F ; Bourlard, H</creatorcontrib><description>Speaker diarization for meetings data are recently converging towards multistream systems. The most common complementary features used in combination with MFCC are Time Delay of Arrival (TDOA). Also other features have been proposed although, there are no reported improvements on top of MFCC+TDOA systems. In this work we investigate the combination of other feature sets along with MFCC+TDOA. We discuss issues and problems related to the weighting of four different streams proposing a solution based on a smoothed version of the speaker error. Experiments are presented on NIST RT06 meeting diarization evaluation. Results reveal that the combination of four acoustic feature streams results in a 30% relative improvement with respect to the MFCC+TDOA feature combination. To the authors' best knowledge, this is the first successful attempt to improve the MFCC+TDOA baseline including other feature streams.</description><identifier>ISSN: 1520-6149</identifier><identifier>ISBN: 9781424442959</identifier><identifier>ISBN: 1424442958</identifier><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 9781424442966</identifier><identifier>EISBN: 1424442966</identifier><identifier>DOI: 10.1109/ICASSP.2010.5495086</identifier><language>eng</language><publisher>IEEE</publisher><subject>Delay effects ; Feature combination ; Hidden Markov models ; Information bottleneck principle ; Loudspeakers ; Mel frequency cepstral coefficient ; Microphones ; NIST ; Speaker diarization ; Speech ; Streaming media ; Unsupervised learning</subject><ispartof>2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010, p.4950-4953</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5495086$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54555,54920,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5495086$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Vijayasenan, D</creatorcontrib><creatorcontrib>Valente, F</creatorcontrib><creatorcontrib>Bourlard, H</creatorcontrib><title>Multistream speaker diarization beyond two acoustic feature streams</title><title>2010 IEEE International Conference on Acoustics, Speech and Signal Processing</title><addtitle>ICASSP</addtitle><description>Speaker diarization for meetings data are recently converging towards multistream systems. The most common complementary features used in combination with MFCC are Time Delay of Arrival (TDOA). Also other features have been proposed although, there are no reported improvements on top of MFCC+TDOA systems. In this work we investigate the combination of other feature sets along with MFCC+TDOA. We discuss issues and problems related to the weighting of four different streams proposing a solution based on a smoothed version of the speaker error. Experiments are presented on NIST RT06 meeting diarization evaluation. Results reveal that the combination of four acoustic feature streams results in a 30% relative improvement with respect to the MFCC+TDOA feature combination. To the authors' best knowledge, this is the first successful attempt to improve the MFCC+TDOA baseline including other feature streams.</description><subject>Delay effects</subject><subject>Feature combination</subject><subject>Hidden Markov models</subject><subject>Information bottleneck principle</subject><subject>Loudspeakers</subject><subject>Mel frequency cepstral coefficient</subject><subject>Microphones</subject><subject>NIST</subject><subject>Speaker diarization</subject><subject>Speech</subject><subject>Streaming media</subject><subject>Unsupervised learning</subject><issn>1520-6149</issn><issn>2379-190X</issn><isbn>9781424442959</isbn><isbn>1424442958</isbn><isbn>9781424442966</isbn><isbn>1424442966</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2010</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNpVkMtKA0EURNsXOMZ8QTb9AxP7_VjKoFGIKETBXbg9uQ2tSSZM9yDx6w0kG1cFVZyiKEImnE05Z_7uublfLN6mgh0MrbxmzpyRsbeOK6GUEt6Yc1IJaX3NPfu8-Jdpf0kqrgWrDVf-mtzk_MUYc1a5ijQvw7qkXHqEDc07hG_s6SpBn36hpG5LA-677YqWn45C2w25pJZGhDL0SI9YviVXEdYZxycdkY_Hh_fmqZ6_zg7L53XiVpc6RM2DZUz4NsRWOgVROokBo3EMQQitXNBSSu5jQEADInANxgVvlbJBjsjk2JsQcbnr0wb6_fL0h_wDzl9RXw</recordid><startdate>201003</startdate><enddate>201003</enddate><creator>Vijayasenan, D</creator><creator>Valente, F</creator><creator>Bourlard, H</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201003</creationdate><title>Multistream speaker diarization beyond two acoustic feature streams</title><author>Vijayasenan, D ; Valente, F ; Bourlard, H</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-bf51b70029cbfc384af383ebef680ea22548b533319fbeae6a2b15a68b97447b3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Delay effects</topic><topic>Feature combination</topic><topic>Hidden Markov models</topic><topic>Information bottleneck principle</topic><topic>Loudspeakers</topic><topic>Mel frequency cepstral coefficient</topic><topic>Microphones</topic><topic>NIST</topic><topic>Speaker diarization</topic><topic>Speech</topic><topic>Streaming media</topic><topic>Unsupervised learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Vijayasenan, D</creatorcontrib><creatorcontrib>Valente, F</creatorcontrib><creatorcontrib>Bourlard, H</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Vijayasenan, D</au><au>Valente, F</au><au>Bourlard, H</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Multistream speaker diarization beyond two acoustic feature streams</atitle><btitle>2010 IEEE International Conference on Acoustics, Speech and Signal Processing</btitle><stitle>ICASSP</stitle><date>2010-03</date><risdate>2010</risdate><spage>4950</spage><epage>4953</epage><pages>4950-4953</pages><issn>1520-6149</issn><eissn>2379-190X</eissn><isbn>9781424442959</isbn><isbn>1424442958</isbn><eisbn>9781424442966</eisbn><eisbn>1424442966</eisbn><abstract>Speaker diarization for meetings data are recently converging towards multistream systems. The most common complementary features used in combination with MFCC are Time Delay of Arrival (TDOA). Also other features have been proposed although, there are no reported improvements on top of MFCC+TDOA systems. In this work we investigate the combination of other feature sets along with MFCC+TDOA. We discuss issues and problems related to the weighting of four different streams proposing a solution based on a smoothed version of the speaker error. Experiments are presented on NIST RT06 meeting diarization evaluation. Results reveal that the combination of four acoustic feature streams results in a 30% relative improvement with respect to the MFCC+TDOA feature combination. To the authors' best knowledge, this is the first successful attempt to improve the MFCC+TDOA baseline including other feature streams.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2010.5495086</doi><tpages>4</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1520-6149 |
ispartof | 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010, p.4950-4953 |
issn | 1520-6149 2379-190X |
language | eng |
recordid | cdi_ieee_primary_5495086 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Delay effects Feature combination Hidden Markov models Information bottleneck principle Loudspeakers Mel frequency cepstral coefficient Microphones NIST Speaker diarization Speech Streaming media Unsupervised learning |
title | Multistream speaker diarization beyond two acoustic feature streams |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T14%3A26%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Multistream%20speaker%20diarization%20beyond%20two%20acoustic%20feature%20streams&rft.btitle=2010%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing&rft.au=Vijayasenan,%20D&rft.date=2010-03&rft.spage=4950&rft.epage=4953&rft.pages=4950-4953&rft.issn=1520-6149&rft.eissn=2379-190X&rft.isbn=9781424442959&rft.isbn_list=1424442958&rft_id=info:doi/10.1109/ICASSP.2010.5495086&rft.eisbn=9781424442966&rft.eisbn_list=1424442966&rft_dat=%3Cieee_6IE%3E5495086%3C/ieee_6IE%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i175t-bf51b70029cbfc384af383ebef680ea22548b533319fbeae6a2b15a68b97447b3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5495086&rfr_iscdi=true |