Loading…
Multistream speaker diarization through Information Bottleneck system outputs combination
Speaker diarization of meetings recorded with Multiple Distant Microphones makes extensive use of multiple feature streams like MFCC and Time Delay of Arrivals (TDOA). Typically the combination happens using separate models for each feature stream. This work investigates if the combination of multip...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 4423 |
container_issue | |
container_start_page | 4420 |
container_title | |
container_volume | |
creator | Vijayasenan, Deepu Valente, Fabio Motlicek, Petr |
description | Speaker diarization of meetings recorded with Multiple Distant Microphones makes extensive use of multiple feature streams like MFCC and Time Delay of Arrivals (TDOA). Typically the combination happens using separate models for each feature stream. This work investigates if the combination of multiple feature streams can happen through the combination of multiple diarization systems performed using those features. The paper extends the previously proposed Information Bottleneck method to handle the combination of several probabilistic diarization outputs. In contrast to the conventional model-based feature combination, this technique is referred as system-based combination. Furthermore the paper introduces an hybrid model-system combination. Experiments are run on data from the Rich Transcription campaigns and show that the system based combination largely outperforms the model based combination by 37% relative. The hybrid approaches improve by 10-20%. The analysis of errors shows that the improvements come from the recordings where the individual MFCC and TDOA systems provide very different performances. |
doi_str_mv | 10.1109/ICASSP.2011.5947334 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5947334</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5947334</ieee_id><sourcerecordid>5947334</sourcerecordid><originalsourceid>FETCH-LOGICAL-i220t-cb9c1001dcc831643cece3a4e861b4844adf5b5468de5bd0f2042f7b500fe3d93</originalsourceid><addsrcrecordid>eNo1UMlOwzAUNJtEKf2CXvwDKc9bHB-hYqlUBFJBglPlOC_UtEkq2zmUr6eiZS4jzYxGmiFkzGDCGJib2fR2sXidcGBsoozUQsgTcsWk0hqUMPqUDLjQJmMGPs7IyOji3yvgnAyY4pDlTJpLMorxG_bIudbKDMjnc79JPqaAtqFxi3aNgVbeBv9jk-9amlah679WdNbWXWgO2l2X0gZbdGsadzFhQ7s-bfsUqeua0rd_qWtyUdtNxNGRh-T94f5t-pTNXx73e-aZ5xxS5krjGACrnCsEy6Vw6FBYiUXOSllIaatalUrmRYWqrKDmIHmtSwVQo6iMGJLxodcj4nIbfGPDbnl8SfwCufdabA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Multistream speaker diarization through Information Bottleneck system outputs combination</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Vijayasenan, Deepu ; Valente, Fabio ; Motlicek, Petr</creator><creatorcontrib>Vijayasenan, Deepu ; Valente, Fabio ; Motlicek, Petr</creatorcontrib><description>Speaker diarization of meetings recorded with Multiple Distant Microphones makes extensive use of multiple feature streams like MFCC and Time Delay of Arrivals (TDOA). Typically the combination happens using separate models for each feature stream. This work investigates if the combination of multiple feature streams can happen through the combination of multiple diarization systems performed using those features. The paper extends the previously proposed Information Bottleneck method to handle the combination of several probabilistic diarization outputs. In contrast to the conventional model-based feature combination, this technique is referred as system-based combination. Furthermore the paper introduces an hybrid model-system combination. Experiments are run on data from the Rich Transcription campaigns and show that the system based combination largely outperforms the model based combination by 37% relative. The hybrid approaches improve by 10-20%. The analysis of errors shows that the improvements come from the recordings where the individual MFCC and TDOA systems provide very different performances.</description><identifier>ISSN: 1520-6149</identifier><identifier>ISBN: 9781457705380</identifier><identifier>ISBN: 1457705389</identifier><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 1457705397</identifier><identifier>EISBN: 9781457705373</identifier><identifier>EISBN: 9781457705397</identifier><identifier>EISBN: 1457705370</identifier><identifier>DOI: 10.1109/ICASSP.2011.5947334</identifier><language>eng</language><publisher>IEEE</publisher><subject>Clustering algorithms ; Computational modeling ; diarization system combination ; Estimation ; Feature combination ; Hidden Markov models ; Information bottleneck principle ; Mel frequency cepstral coefficient ; Mutual information ; Speaker diarization ; Speech ; TDOA features</subject><ispartof>2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011, p.4420-4423</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5947334$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54555,54920,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5947334$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Vijayasenan, Deepu</creatorcontrib><creatorcontrib>Valente, Fabio</creatorcontrib><creatorcontrib>Motlicek, Petr</creatorcontrib><title>Multistream speaker diarization through Information Bottleneck system outputs combination</title><title>2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</title><addtitle>ICASSP</addtitle><description>Speaker diarization of meetings recorded with Multiple Distant Microphones makes extensive use of multiple feature streams like MFCC and Time Delay of Arrivals (TDOA). Typically the combination happens using separate models for each feature stream. This work investigates if the combination of multiple feature streams can happen through the combination of multiple diarization systems performed using those features. The paper extends the previously proposed Information Bottleneck method to handle the combination of several probabilistic diarization outputs. In contrast to the conventional model-based feature combination, this technique is referred as system-based combination. Furthermore the paper introduces an hybrid model-system combination. Experiments are run on data from the Rich Transcription campaigns and show that the system based combination largely outperforms the model based combination by 37% relative. The hybrid approaches improve by 10-20%. The analysis of errors shows that the improvements come from the recordings where the individual MFCC and TDOA systems provide very different performances.</description><subject>Clustering algorithms</subject><subject>Computational modeling</subject><subject>diarization system combination</subject><subject>Estimation</subject><subject>Feature combination</subject><subject>Hidden Markov models</subject><subject>Information bottleneck principle</subject><subject>Mel frequency cepstral coefficient</subject><subject>Mutual information</subject><subject>Speaker diarization</subject><subject>Speech</subject><subject>TDOA features</subject><issn>1520-6149</issn><issn>2379-190X</issn><isbn>9781457705380</isbn><isbn>1457705389</isbn><isbn>1457705397</isbn><isbn>9781457705373</isbn><isbn>9781457705397</isbn><isbn>1457705370</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2011</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNo1UMlOwzAUNJtEKf2CXvwDKc9bHB-hYqlUBFJBglPlOC_UtEkq2zmUr6eiZS4jzYxGmiFkzGDCGJib2fR2sXidcGBsoozUQsgTcsWk0hqUMPqUDLjQJmMGPs7IyOji3yvgnAyY4pDlTJpLMorxG_bIudbKDMjnc79JPqaAtqFxi3aNgVbeBv9jk-9amlah679WdNbWXWgO2l2X0gZbdGsadzFhQ7s-bfsUqeua0rd_qWtyUdtNxNGRh-T94f5t-pTNXx73e-aZ5xxS5krjGACrnCsEy6Vw6FBYiUXOSllIaatalUrmRYWqrKDmIHmtSwVQo6iMGJLxodcj4nIbfGPDbnl8SfwCufdabA</recordid><startdate>20110101</startdate><enddate>20110101</enddate><creator>Vijayasenan, Deepu</creator><creator>Valente, Fabio</creator><creator>Motlicek, Petr</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>20110101</creationdate><title>Multistream speaker diarization through Information Bottleneck system outputs combination</title><author>Vijayasenan, Deepu ; Valente, Fabio ; Motlicek, Petr</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i220t-cb9c1001dcc831643cece3a4e861b4844adf5b5468de5bd0f2042f7b500fe3d93</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Clustering algorithms</topic><topic>Computational modeling</topic><topic>diarization system combination</topic><topic>Estimation</topic><topic>Feature combination</topic><topic>Hidden Markov models</topic><topic>Information bottleneck principle</topic><topic>Mel frequency cepstral coefficient</topic><topic>Mutual information</topic><topic>Speaker diarization</topic><topic>Speech</topic><topic>TDOA features</topic><toplevel>online_resources</toplevel><creatorcontrib>Vijayasenan, Deepu</creatorcontrib><creatorcontrib>Valente, Fabio</creatorcontrib><creatorcontrib>Motlicek, Petr</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEL</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Vijayasenan, Deepu</au><au>Valente, Fabio</au><au>Motlicek, Petr</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Multistream speaker diarization through Information Bottleneck system outputs combination</atitle><btitle>2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</btitle><stitle>ICASSP</stitle><date>2011-01-01</date><risdate>2011</risdate><spage>4420</spage><epage>4423</epage><pages>4420-4423</pages><issn>1520-6149</issn><eissn>2379-190X</eissn><isbn>9781457705380</isbn><isbn>1457705389</isbn><eisbn>1457705397</eisbn><eisbn>9781457705373</eisbn><eisbn>9781457705397</eisbn><eisbn>1457705370</eisbn><abstract>Speaker diarization of meetings recorded with Multiple Distant Microphones makes extensive use of multiple feature streams like MFCC and Time Delay of Arrivals (TDOA). Typically the combination happens using separate models for each feature stream. This work investigates if the combination of multiple feature streams can happen through the combination of multiple diarization systems performed using those features. The paper extends the previously proposed Information Bottleneck method to handle the combination of several probabilistic diarization outputs. In contrast to the conventional model-based feature combination, this technique is referred as system-based combination. Furthermore the paper introduces an hybrid model-system combination. Experiments are run on data from the Rich Transcription campaigns and show that the system based combination largely outperforms the model based combination by 37% relative. The hybrid approaches improve by 10-20%. The analysis of errors shows that the improvements come from the recordings where the individual MFCC and TDOA systems provide very different performances.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2011.5947334</doi><tpages>4</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1520-6149 |
ispartof | 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011, p.4420-4423 |
issn | 1520-6149 2379-190X |
language | eng |
recordid | cdi_ieee_primary_5947334 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Clustering algorithms Computational modeling diarization system combination Estimation Feature combination Hidden Markov models Information bottleneck principle Mel frequency cepstral coefficient Mutual information Speaker diarization Speech TDOA features |
title | Multistream speaker diarization through Information Bottleneck system outputs combination |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T14%3A18%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Multistream%20speaker%20diarization%20through%20Information%20Bottleneck%20system%20outputs%20combination&rft.btitle=2011%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing%20(ICASSP)&rft.au=Vijayasenan,%20Deepu&rft.date=2011-01-01&rft.spage=4420&rft.epage=4423&rft.pages=4420-4423&rft.issn=1520-6149&rft.eissn=2379-190X&rft.isbn=9781457705380&rft.isbn_list=1457705389&rft_id=info:doi/10.1109/ICASSP.2011.5947334&rft.eisbn=1457705397&rft.eisbn_list=9781457705373&rft.eisbn_list=9781457705397&rft.eisbn_list=1457705370&rft_dat=%3Cieee_6IE%3E5947334%3C/ieee_6IE%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i220t-cb9c1001dcc831643cece3a4e861b4844adf5b5468de5bd0f2042f7b500fe3d93%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5947334&rfr_iscdi=true |