Loading…

Multistream speaker diarization through Information Bottleneck system outputs combination

Speaker diarization of meetings recorded with Multiple Distant Microphones makes extensive use of multiple feature streams like MFCC and Time Delay of Arrivals (TDOA). Typically the combination happens using separate models for each feature stream. This work investigates if the combination of multip...

Full description

Saved in:
Bibliographic Details
Main Authors: Vijayasenan, Deepu, Valente, Fabio, Motlicek, Petr
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 4423
container_issue
container_start_page 4420
container_title
container_volume
creator Vijayasenan, Deepu
Valente, Fabio
Motlicek, Petr
description Speaker diarization of meetings recorded with Multiple Distant Microphones makes extensive use of multiple feature streams like MFCC and Time Delay of Arrivals (TDOA). Typically the combination happens using separate models for each feature stream. This work investigates if the combination of multiple feature streams can happen through the combination of multiple diarization systems performed using those features. The paper extends the previously proposed Information Bottleneck method to handle the combination of several probabilistic diarization outputs. In contrast to the conventional model-based feature combination, this technique is referred as system-based combination. Furthermore the paper introduces an hybrid model-system combination. Experiments are run on data from the Rich Transcription campaigns and show that the system based combination largely outperforms the model based combination by 37% relative. The hybrid approaches improve by 10-20%. The analysis of errors shows that the improvements come from the recordings where the individual MFCC and TDOA systems provide very different performances.
doi_str_mv 10.1109/ICASSP.2011.5947334
format conference_proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5947334</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5947334</ieee_id><sourcerecordid>5947334</sourcerecordid><originalsourceid>FETCH-LOGICAL-i220t-cb9c1001dcc831643cece3a4e861b4844adf5b5468de5bd0f2042f7b500fe3d93</originalsourceid><addsrcrecordid>eNo1UMlOwzAUNJtEKf2CXvwDKc9bHB-hYqlUBFJBglPlOC_UtEkq2zmUr6eiZS4jzYxGmiFkzGDCGJib2fR2sXidcGBsoozUQsgTcsWk0hqUMPqUDLjQJmMGPs7IyOji3yvgnAyY4pDlTJpLMorxG_bIudbKDMjnc79JPqaAtqFxi3aNgVbeBv9jk-9amlah679WdNbWXWgO2l2X0gZbdGsadzFhQ7s-bfsUqeua0rd_qWtyUdtNxNGRh-T94f5t-pTNXx73e-aZ5xxS5krjGACrnCsEy6Vw6FBYiUXOSllIaatalUrmRYWqrKDmIHmtSwVQo6iMGJLxodcj4nIbfGPDbnl8SfwCufdabA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Multistream speaker diarization through Information Bottleneck system outputs combination</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Vijayasenan, Deepu ; Valente, Fabio ; Motlicek, Petr</creator><creatorcontrib>Vijayasenan, Deepu ; Valente, Fabio ; Motlicek, Petr</creatorcontrib><description>Speaker diarization of meetings recorded with Multiple Distant Microphones makes extensive use of multiple feature streams like MFCC and Time Delay of Arrivals (TDOA). Typically the combination happens using separate models for each feature stream. This work investigates if the combination of multiple feature streams can happen through the combination of multiple diarization systems performed using those features. The paper extends the previously proposed Information Bottleneck method to handle the combination of several probabilistic diarization outputs. In contrast to the conventional model-based feature combination, this technique is referred as system-based combination. Furthermore the paper introduces an hybrid model-system combination. Experiments are run on data from the Rich Transcription campaigns and show that the system based combination largely outperforms the model based combination by 37% relative. The hybrid approaches improve by 10-20%. The analysis of errors shows that the improvements come from the recordings where the individual MFCC and TDOA systems provide very different performances.</description><identifier>ISSN: 1520-6149</identifier><identifier>ISBN: 9781457705380</identifier><identifier>ISBN: 1457705389</identifier><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 1457705397</identifier><identifier>EISBN: 9781457705373</identifier><identifier>EISBN: 9781457705397</identifier><identifier>EISBN: 1457705370</identifier><identifier>DOI: 10.1109/ICASSP.2011.5947334</identifier><language>eng</language><publisher>IEEE</publisher><subject>Clustering algorithms ; Computational modeling ; diarization system combination ; Estimation ; Feature combination ; Hidden Markov models ; Information bottleneck principle ; Mel frequency cepstral coefficient ; Mutual information ; Speaker diarization ; Speech ; TDOA features</subject><ispartof>2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011, p.4420-4423</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5947334$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54555,54920,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5947334$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Vijayasenan, Deepu</creatorcontrib><creatorcontrib>Valente, Fabio</creatorcontrib><creatorcontrib>Motlicek, Petr</creatorcontrib><title>Multistream speaker diarization through Information Bottleneck system outputs combination</title><title>2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</title><addtitle>ICASSP</addtitle><description>Speaker diarization of meetings recorded with Multiple Distant Microphones makes extensive use of multiple feature streams like MFCC and Time Delay of Arrivals (TDOA). Typically the combination happens using separate models for each feature stream. This work investigates if the combination of multiple feature streams can happen through the combination of multiple diarization systems performed using those features. The paper extends the previously proposed Information Bottleneck method to handle the combination of several probabilistic diarization outputs. In contrast to the conventional model-based feature combination, this technique is referred as system-based combination. Furthermore the paper introduces an hybrid model-system combination. Experiments are run on data from the Rich Transcription campaigns and show that the system based combination largely outperforms the model based combination by 37% relative. The hybrid approaches improve by 10-20%. The analysis of errors shows that the improvements come from the recordings where the individual MFCC and TDOA systems provide very different performances.</description><subject>Clustering algorithms</subject><subject>Computational modeling</subject><subject>diarization system combination</subject><subject>Estimation</subject><subject>Feature combination</subject><subject>Hidden Markov models</subject><subject>Information bottleneck principle</subject><subject>Mel frequency cepstral coefficient</subject><subject>Mutual information</subject><subject>Speaker diarization</subject><subject>Speech</subject><subject>TDOA features</subject><issn>1520-6149</issn><issn>2379-190X</issn><isbn>9781457705380</isbn><isbn>1457705389</isbn><isbn>1457705397</isbn><isbn>9781457705373</isbn><isbn>9781457705397</isbn><isbn>1457705370</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2011</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNo1UMlOwzAUNJtEKf2CXvwDKc9bHB-hYqlUBFJBglPlOC_UtEkq2zmUr6eiZS4jzYxGmiFkzGDCGJib2fR2sXidcGBsoozUQsgTcsWk0hqUMPqUDLjQJmMGPs7IyOji3yvgnAyY4pDlTJpLMorxG_bIudbKDMjnc79JPqaAtqFxi3aNgVbeBv9jk-9amlah679WdNbWXWgO2l2X0gZbdGsadzFhQ7s-bfsUqeua0rd_qWtyUdtNxNGRh-T94f5t-pTNXx73e-aZ5xxS5krjGACrnCsEy6Vw6FBYiUXOSllIaatalUrmRYWqrKDmIHmtSwVQo6iMGJLxodcj4nIbfGPDbnl8SfwCufdabA</recordid><startdate>20110101</startdate><enddate>20110101</enddate><creator>Vijayasenan, Deepu</creator><creator>Valente, Fabio</creator><creator>Motlicek, Petr</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>20110101</creationdate><title>Multistream speaker diarization through Information Bottleneck system outputs combination</title><author>Vijayasenan, Deepu ; Valente, Fabio ; Motlicek, Petr</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i220t-cb9c1001dcc831643cece3a4e861b4844adf5b5468de5bd0f2042f7b500fe3d93</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Clustering algorithms</topic><topic>Computational modeling</topic><topic>diarization system combination</topic><topic>Estimation</topic><topic>Feature combination</topic><topic>Hidden Markov models</topic><topic>Information bottleneck principle</topic><topic>Mel frequency cepstral coefficient</topic><topic>Mutual information</topic><topic>Speaker diarization</topic><topic>Speech</topic><topic>TDOA features</topic><toplevel>online_resources</toplevel><creatorcontrib>Vijayasenan, Deepu</creatorcontrib><creatorcontrib>Valente, Fabio</creatorcontrib><creatorcontrib>Motlicek, Petr</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEL</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Vijayasenan, Deepu</au><au>Valente, Fabio</au><au>Motlicek, Petr</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Multistream speaker diarization through Information Bottleneck system outputs combination</atitle><btitle>2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</btitle><stitle>ICASSP</stitle><date>2011-01-01</date><risdate>2011</risdate><spage>4420</spage><epage>4423</epage><pages>4420-4423</pages><issn>1520-6149</issn><eissn>2379-190X</eissn><isbn>9781457705380</isbn><isbn>1457705389</isbn><eisbn>1457705397</eisbn><eisbn>9781457705373</eisbn><eisbn>9781457705397</eisbn><eisbn>1457705370</eisbn><abstract>Speaker diarization of meetings recorded with Multiple Distant Microphones makes extensive use of multiple feature streams like MFCC and Time Delay of Arrivals (TDOA). Typically the combination happens using separate models for each feature stream. This work investigates if the combination of multiple feature streams can happen through the combination of multiple diarization systems performed using those features. The paper extends the previously proposed Information Bottleneck method to handle the combination of several probabilistic diarization outputs. In contrast to the conventional model-based feature combination, this technique is referred as system-based combination. Furthermore the paper introduces an hybrid model-system combination. Experiments are run on data from the Rich Transcription campaigns and show that the system based combination largely outperforms the model based combination by 37% relative. The hybrid approaches improve by 10-20%. The analysis of errors shows that the improvements come from the recordings where the individual MFCC and TDOA systems provide very different performances.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2011.5947334</doi><tpages>4</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1520-6149
ispartof 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011, p.4420-4423
issn 1520-6149
2379-190X
language eng
recordid cdi_ieee_primary_5947334
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Clustering algorithms
Computational modeling
diarization system combination
Estimation
Feature combination
Hidden Markov models
Information bottleneck principle
Mel frequency cepstral coefficient
Mutual information
Speaker diarization
Speech
TDOA features
title Multistream speaker diarization through Information Bottleneck system outputs combination
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T14%3A18%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Multistream%20speaker%20diarization%20through%20Information%20Bottleneck%20system%20outputs%20combination&rft.btitle=2011%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing%20(ICASSP)&rft.au=Vijayasenan,%20Deepu&rft.date=2011-01-01&rft.spage=4420&rft.epage=4423&rft.pages=4420-4423&rft.issn=1520-6149&rft.eissn=2379-190X&rft.isbn=9781457705380&rft.isbn_list=1457705389&rft_id=info:doi/10.1109/ICASSP.2011.5947334&rft.eisbn=1457705397&rft.eisbn_list=9781457705373&rft.eisbn_list=9781457705397&rft.eisbn_list=1457705370&rft_dat=%3Cieee_6IE%3E5947334%3C/ieee_6IE%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i220t-cb9c1001dcc831643cece3a4e861b4844adf5b5468de5bd0f2042f7b500fe3d93%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5947334&rfr_iscdi=true