Loading…

Variational Bayesian speaker diarization of meeting recordings

This paper investigates the use of the Variational Bayesian (VB) framework for speaker diarization of meetings data extending previous related works on Broadcast News audio. VB learning aims at maximizing a bound, known as Free Energy, on the model marginal likelihood and allows joint model learning...

Full description

Saved in:
Bibliographic Details
Main Authors: Valente, Fabio, Motlicek, Petr, Vijayasenan, Deepu
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 4957
container_issue
container_start_page 4954
container_title
container_volume
creator Valente, Fabio
Motlicek, Petr
Vijayasenan, Deepu
description This paper investigates the use of the Variational Bayesian (VB) framework for speaker diarization of meetings data extending previous related works on Broadcast News audio. VB learning aims at maximizing a bound, known as Free Energy, on the model marginal likelihood and allows joint model learning and model selection according to the same objective function. While the BIC is valid only in the asymptotic limit, the Free Energy is always a valid bound. The paper proposes the use of Free Energy as objective function in speaker diarization. It can be used to select dynamically without any supervision or tuning, elements that typically affect the diarization performance i.e. the inferred number of speakers, the size of the GMM and the initialization. The proposed approach is compared with a conventional state-of-the-art system on the RT06 evaluation data for meeting recordings diarization and shows an improvement of 8.4% relative in terms of speaker error.
doi_str_mv 10.1109/ICASSP.2010.5495087
format conference_proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5495087</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5495087</ieee_id><sourcerecordid>5495087</sourcerecordid><originalsourceid>FETCH-LOGICAL-i241t-ac92f1ce07a67468f29f3efbf09d567cc6d5ad8b6c47c7d2ba6bdca8fa1be0ca3</originalsourceid><addsrcrecordid>eNpVkMtKAzEYheMNHGufoJu8wNQkk-tG0FIvUFCoirvyT_JHou1MSbqpT--g3bg6h_PB4XAImXA25Zy5q8fZzXL5PBVsCJR0illzRMbOWC6FlFI4rY9JJRrjau7Y-8k_ptwpqbgSrNZcunNyUconY0OFtBW5foOcYJf6Dtb0FvZYEnS0bBG-MNOQBvr9i2kf6QZxl7oPmtH3OQyuXJKzCOuC44OOyOvd_GX2UC-e7ofRizoJyXc1eCci98gMaCO1jcLFBmMbmQtKG-91UBBsq7003gTRgm6DBxuBt8g8NCMy-etNiLja5rSBvF8drmh-APV6UJo</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Variational Bayesian speaker diarization of meeting recordings</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Valente, Fabio ; Motlicek, Petr ; Vijayasenan, Deepu</creator><creatorcontrib>Valente, Fabio ; Motlicek, Petr ; Vijayasenan, Deepu</creatorcontrib><description>This paper investigates the use of the Variational Bayesian (VB) framework for speaker diarization of meetings data extending previous related works on Broadcast News audio. VB learning aims at maximizing a bound, known as Free Energy, on the model marginal likelihood and allows joint model learning and model selection according to the same objective function. While the BIC is valid only in the asymptotic limit, the Free Energy is always a valid bound. The paper proposes the use of Free Energy as objective function in speaker diarization. It can be used to select dynamically without any supervision or tuning, elements that typically affect the diarization performance i.e. the inferred number of speakers, the size of the GMM and the initialization. The proposed approach is compared with a conventional state-of-the-art system on the RT06 evaluation data for meeting recordings diarization and shows an improvement of 8.4% relative in terms of speaker error.</description><identifier>ISSN: 1520-6149</identifier><identifier>ISBN: 9781424442959</identifier><identifier>ISBN: 1424442958</identifier><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 9781424442966</identifier><identifier>EISBN: 1424442966</identifier><identifier>DOI: 10.1109/ICASSP.2010.5495087</identifier><language>eng</language><publisher>IEEE</publisher><subject>Bayesian methods ; Broadcasting ; Density estimation robust algorithm ; Error analysis ; Meetings Data ; Microphones ; Probability ; Speaker Diarization ; Variational Bayesian Methods</subject><ispartof>2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010, p.4954-4957</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5495087$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54555,54920,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5495087$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Valente, Fabio</creatorcontrib><creatorcontrib>Motlicek, Petr</creatorcontrib><creatorcontrib>Vijayasenan, Deepu</creatorcontrib><title>Variational Bayesian speaker diarization of meeting recordings</title><title>2010 IEEE International Conference on Acoustics, Speech and Signal Processing</title><addtitle>ICASSP</addtitle><description>This paper investigates the use of the Variational Bayesian (VB) framework for speaker diarization of meetings data extending previous related works on Broadcast News audio. VB learning aims at maximizing a bound, known as Free Energy, on the model marginal likelihood and allows joint model learning and model selection according to the same objective function. While the BIC is valid only in the asymptotic limit, the Free Energy is always a valid bound. The paper proposes the use of Free Energy as objective function in speaker diarization. It can be used to select dynamically without any supervision or tuning, elements that typically affect the diarization performance i.e. the inferred number of speakers, the size of the GMM and the initialization. The proposed approach is compared with a conventional state-of-the-art system on the RT06 evaluation data for meeting recordings diarization and shows an improvement of 8.4% relative in terms of speaker error.</description><subject>Bayesian methods</subject><subject>Broadcasting</subject><subject>Density estimation robust algorithm</subject><subject>Error analysis</subject><subject>Meetings Data</subject><subject>Microphones</subject><subject>Probability</subject><subject>Speaker Diarization</subject><subject>Variational Bayesian Methods</subject><issn>1520-6149</issn><issn>2379-190X</issn><isbn>9781424442959</isbn><isbn>1424442958</isbn><isbn>9781424442966</isbn><isbn>1424442966</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2010</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNpVkMtKAzEYheMNHGufoJu8wNQkk-tG0FIvUFCoirvyT_JHou1MSbqpT--g3bg6h_PB4XAImXA25Zy5q8fZzXL5PBVsCJR0illzRMbOWC6FlFI4rY9JJRrjau7Y-8k_ptwpqbgSrNZcunNyUconY0OFtBW5foOcYJf6Dtb0FvZYEnS0bBG-MNOQBvr9i2kf6QZxl7oPmtH3OQyuXJKzCOuC44OOyOvd_GX2UC-e7ofRizoJyXc1eCci98gMaCO1jcLFBmMbmQtKG-91UBBsq7003gTRgm6DBxuBt8g8NCMy-etNiLja5rSBvF8drmh-APV6UJo</recordid><startdate>20100101</startdate><enddate>20100101</enddate><creator>Valente, Fabio</creator><creator>Motlicek, Petr</creator><creator>Vijayasenan, Deepu</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>20100101</creationdate><title>Variational Bayesian speaker diarization of meeting recordings</title><author>Valente, Fabio ; Motlicek, Petr ; Vijayasenan, Deepu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i241t-ac92f1ce07a67468f29f3efbf09d567cc6d5ad8b6c47c7d2ba6bdca8fa1be0ca3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Bayesian methods</topic><topic>Broadcasting</topic><topic>Density estimation robust algorithm</topic><topic>Error analysis</topic><topic>Meetings Data</topic><topic>Microphones</topic><topic>Probability</topic><topic>Speaker Diarization</topic><topic>Variational Bayesian Methods</topic><toplevel>online_resources</toplevel><creatorcontrib>Valente, Fabio</creatorcontrib><creatorcontrib>Motlicek, Petr</creatorcontrib><creatorcontrib>Vijayasenan, Deepu</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Valente, Fabio</au><au>Motlicek, Petr</au><au>Vijayasenan, Deepu</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Variational Bayesian speaker diarization of meeting recordings</atitle><btitle>2010 IEEE International Conference on Acoustics, Speech and Signal Processing</btitle><stitle>ICASSP</stitle><date>2010-01-01</date><risdate>2010</risdate><spage>4954</spage><epage>4957</epage><pages>4954-4957</pages><issn>1520-6149</issn><eissn>2379-190X</eissn><isbn>9781424442959</isbn><isbn>1424442958</isbn><eisbn>9781424442966</eisbn><eisbn>1424442966</eisbn><abstract>This paper investigates the use of the Variational Bayesian (VB) framework for speaker diarization of meetings data extending previous related works on Broadcast News audio. VB learning aims at maximizing a bound, known as Free Energy, on the model marginal likelihood and allows joint model learning and model selection according to the same objective function. While the BIC is valid only in the asymptotic limit, the Free Energy is always a valid bound. The paper proposes the use of Free Energy as objective function in speaker diarization. It can be used to select dynamically without any supervision or tuning, elements that typically affect the diarization performance i.e. the inferred number of speakers, the size of the GMM and the initialization. The proposed approach is compared with a conventional state-of-the-art system on the RT06 evaluation data for meeting recordings diarization and shows an improvement of 8.4% relative in terms of speaker error.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2010.5495087</doi><tpages>4</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1520-6149
ispartof 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010, p.4954-4957
issn 1520-6149
2379-190X
language eng
recordid cdi_ieee_primary_5495087
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Bayesian methods
Broadcasting
Density estimation robust algorithm
Error analysis
Meetings Data
Microphones
Probability
Speaker Diarization
Variational Bayesian Methods
title Variational Bayesian speaker diarization of meeting recordings
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T14%3A39%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Variational%20Bayesian%20speaker%20diarization%20of%20meeting%20recordings&rft.btitle=2010%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing&rft.au=Valente,%20Fabio&rft.date=2010-01-01&rft.spage=4954&rft.epage=4957&rft.pages=4954-4957&rft.issn=1520-6149&rft.eissn=2379-190X&rft.isbn=9781424442959&rft.isbn_list=1424442958&rft_id=info:doi/10.1109/ICASSP.2010.5495087&rft.eisbn=9781424442966&rft.eisbn_list=1424442966&rft_dat=%3Cieee_6IE%3E5495087%3C/ieee_6IE%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i241t-ac92f1ce07a67468f29f3efbf09d567cc6d5ad8b6c47c7d2ba6bdca8fa1be0ca3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5495087&rfr_iscdi=true