Loading…

An Information Theoretic Approach to Speaker Diarization of Meeting Data

A speaker diarization system based on an information theoretic framework is described. The problem is formulated according to the information bottleneck (IB) principle. Unlike other approaches where the distance between speaker segments is arbitrarily introduced, the IB method seeks the partition th...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on audio, speech, and language processing speech, and language processing, 2009-09, Vol.17 (7), p.1382-1393
Main Authors: Vijayasenan, D., Valente, F., Bourlard, H.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c396t-abf3a63dbc08c02ed44d18ddd810114b2323e80e73631113c0abc04d1afb4ef93
cites cdi_FETCH-LOGICAL-c396t-abf3a63dbc08c02ed44d18ddd810114b2323e80e73631113c0abc04d1afb4ef93
container_end_page 1393
container_issue 7
container_start_page 1382
container_title IEEE transactions on audio, speech, and language processing
container_volume 17
creator Vijayasenan, D.
Valente, F.
Bourlard, H.
description A speaker diarization system based on an information theoretic framework is described. The problem is formulated according to the information bottleneck (IB) principle. Unlike other approaches where the distance between speaker segments is arbitrarily introduced, the IB method seeks the partition that maximizes the mutual information between observations and variables relevant for the problem while minimizing the distortion between observations. This solves the problem of choosing the distance between speech segments, which becomes the Jensen-Shannon divergence as it arises from the IB objective function optimization. We discuss issues related to speaker diarization using this information theoretic framework such as the criteria for inferring the number of speakers, the tradeoff between quality and compression achieved by the diarization system, and the algorithms for optimizing the objective function. Furthermore, we benchmark the proposed system against a state-of-the-art system on the NIST RT06 (rich transcription) data set for speaker diarization of meetings. The IB-based system achieves a diarization error rate of 23.2% compared to 23.6% for the baseline system. This approach being mainly based on nonparametric clustering, it runs significantly faster than the baseline HMM/GMM based system, resulting in faster-than-real-time diarization.
doi_str_mv 10.1109/TASL.2009.2015698
format article
fullrecord <record><control><sourceid>proquest_pasca</sourceid><recordid>TN_cdi_proquest_miscellaneous_36335533</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5165121</ieee_id><sourcerecordid>2568853311</sourcerecordid><originalsourceid>FETCH-LOGICAL-c396t-abf3a63dbc08c02ed44d18ddd810114b2323e80e73631113c0abc04d1afb4ef93</originalsourceid><addsrcrecordid>eNpd0EFLwzAUB_AgCs7pBxAvRdBbZ16TtumxbOoGEw-b55Cmr66za2rSHfTTm7Kxg5ckkN__8fgTcgt0AkCzp3W-Wk4iSjN_QJxk4oyMII5FmGYRPz-9IbkkV85tKeUs4TAi87wNFm1l7E71tWmD9QaNxb7WQd511ii9CXoTrDpUX2iDWa1s_XuQpgre0Mv2M5ipXl2Ti0o1Dm-O95h8vDyvp_Nw-f66mObLULMs6UNVVEwlrCw0FZpGWHJegijLUgAF4EXEIoaCYsoSBgBMU-WpN6oqOFYZG5PHw1y_3fceXS93tdPYNKpFs3fS51gcM-bh_T-4NXvb-t1kBinnLOPCIzggbY1zFivZ2Xqn7I8EKodi5VCsHIqVx2J95uE4WDmtmsqqVtfuFIwiKkTKBnd3cDUinr5jSGKIgP0B-IOASw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>917443948</pqid></control><display><type>article</type><title>An Information Theoretic Approach to Speaker Diarization of Meeting Data</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Vijayasenan, D. ; Valente, F. ; Bourlard, H.</creator><creatorcontrib>Vijayasenan, D. ; Valente, F. ; Bourlard, H.</creatorcontrib><description>A speaker diarization system based on an information theoretic framework is described. The problem is formulated according to the information bottleneck (IB) principle. Unlike other approaches where the distance between speaker segments is arbitrarily introduced, the IB method seeks the partition that maximizes the mutual information between observations and variables relevant for the problem while minimizing the distortion between observations. This solves the problem of choosing the distance between speech segments, which becomes the Jensen-Shannon divergence as it arises from the IB objective function optimization. We discuss issues related to speaker diarization using this information theoretic framework such as the criteria for inferring the number of speakers, the tradeoff between quality and compression achieved by the diarization system, and the algorithms for optimizing the objective function. Furthermore, we benchmark the proposed system against a state-of-the-art system on the NIST RT06 (rich transcription) data set for speaker diarization of meetings. The IB-based system achieves a diarization error rate of 23.2% compared to 23.6% for the baseline system. This approach being mainly based on nonparametric clustering, it runs significantly faster than the baseline HMM/GMM based system, resulting in faster-than-real-time diarization.</description><identifier>ISSN: 1558-7916</identifier><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 1558-7924</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASL.2009.2015698</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway, NJ: IEEE</publisher><subject>Acoustic distortion ; Applied sciences ; Automatic speech recognition ; Error analysis ; Exact sciences and technology ; Hidden Markov models ; Indexing ; Information bottleneck (IB) ; Information theory ; Information, signal and communications theory ; Loudspeakers ; meetings data ; Mutual information ; NIST ; Signal and communications theory ; Signal processing ; Signal representation. Spectral analysis ; Signal, noise ; speaker diarization ; Speech processing ; Streaming media ; Studies ; Telecommunications and information theory ; Vocabulary</subject><ispartof>IEEE transactions on audio, speech, and language processing, 2009-09, Vol.17 (7), p.1382-1393</ispartof><rights>2009 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2009</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c396t-abf3a63dbc08c02ed44d18ddd810114b2323e80e73631113c0abc04d1afb4ef93</citedby><cites>FETCH-LOGICAL-c396t-abf3a63dbc08c02ed44d18ddd810114b2323e80e73631113c0abc04d1afb4ef93</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5165121$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=22088738$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Vijayasenan, D.</creatorcontrib><creatorcontrib>Valente, F.</creatorcontrib><creatorcontrib>Bourlard, H.</creatorcontrib><title>An Information Theoretic Approach to Speaker Diarization of Meeting Data</title><title>IEEE transactions on audio, speech, and language processing</title><addtitle>TASL</addtitle><description>A speaker diarization system based on an information theoretic framework is described. The problem is formulated according to the information bottleneck (IB) principle. Unlike other approaches where the distance between speaker segments is arbitrarily introduced, the IB method seeks the partition that maximizes the mutual information between observations and variables relevant for the problem while minimizing the distortion between observations. This solves the problem of choosing the distance between speech segments, which becomes the Jensen-Shannon divergence as it arises from the IB objective function optimization. We discuss issues related to speaker diarization using this information theoretic framework such as the criteria for inferring the number of speakers, the tradeoff between quality and compression achieved by the diarization system, and the algorithms for optimizing the objective function. Furthermore, we benchmark the proposed system against a state-of-the-art system on the NIST RT06 (rich transcription) data set for speaker diarization of meetings. The IB-based system achieves a diarization error rate of 23.2% compared to 23.6% for the baseline system. This approach being mainly based on nonparametric clustering, it runs significantly faster than the baseline HMM/GMM based system, resulting in faster-than-real-time diarization.</description><subject>Acoustic distortion</subject><subject>Applied sciences</subject><subject>Automatic speech recognition</subject><subject>Error analysis</subject><subject>Exact sciences and technology</subject><subject>Hidden Markov models</subject><subject>Indexing</subject><subject>Information bottleneck (IB)</subject><subject>Information theory</subject><subject>Information, signal and communications theory</subject><subject>Loudspeakers</subject><subject>meetings data</subject><subject>Mutual information</subject><subject>NIST</subject><subject>Signal and communications theory</subject><subject>Signal processing</subject><subject>Signal representation. Spectral analysis</subject><subject>Signal, noise</subject><subject>speaker diarization</subject><subject>Speech processing</subject><subject>Streaming media</subject><subject>Studies</subject><subject>Telecommunications and information theory</subject><subject>Vocabulary</subject><issn>1558-7916</issn><issn>2329-9290</issn><issn>1558-7924</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2009</creationdate><recordtype>article</recordtype><recordid>eNpd0EFLwzAUB_AgCs7pBxAvRdBbZ16TtumxbOoGEw-b55Cmr66za2rSHfTTm7Kxg5ckkN__8fgTcgt0AkCzp3W-Wk4iSjN_QJxk4oyMII5FmGYRPz-9IbkkV85tKeUs4TAi87wNFm1l7E71tWmD9QaNxb7WQd511ii9CXoTrDpUX2iDWa1s_XuQpgre0Mv2M5ipXl2Ti0o1Dm-O95h8vDyvp_Nw-f66mObLULMs6UNVVEwlrCw0FZpGWHJegijLUgAF4EXEIoaCYsoSBgBMU-WpN6oqOFYZG5PHw1y_3fceXS93tdPYNKpFs3fS51gcM-bh_T-4NXvb-t1kBinnLOPCIzggbY1zFivZ2Xqn7I8EKodi5VCsHIqVx2J95uE4WDmtmsqqVtfuFIwiKkTKBnd3cDUinr5jSGKIgP0B-IOASw</recordid><startdate>20090901</startdate><enddate>20090901</enddate><creator>Vijayasenan, D.</creator><creator>Valente, F.</creator><creator>Bourlard, H.</creator><general>IEEE</general><general>Institute of Electrical and Electronics Engineers</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20090901</creationdate><title>An Information Theoretic Approach to Speaker Diarization of Meeting Data</title><author>Vijayasenan, D. ; Valente, F. ; Bourlard, H.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c396t-abf3a63dbc08c02ed44d18ddd810114b2323e80e73631113c0abc04d1afb4ef93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Acoustic distortion</topic><topic>Applied sciences</topic><topic>Automatic speech recognition</topic><topic>Error analysis</topic><topic>Exact sciences and technology</topic><topic>Hidden Markov models</topic><topic>Indexing</topic><topic>Information bottleneck (IB)</topic><topic>Information theory</topic><topic>Information, signal and communications theory</topic><topic>Loudspeakers</topic><topic>meetings data</topic><topic>Mutual information</topic><topic>NIST</topic><topic>Signal and communications theory</topic><topic>Signal processing</topic><topic>Signal representation. Spectral analysis</topic><topic>Signal, noise</topic><topic>speaker diarization</topic><topic>Speech processing</topic><topic>Streaming media</topic><topic>Studies</topic><topic>Telecommunications and information theory</topic><topic>Vocabulary</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Vijayasenan, D.</creatorcontrib><creatorcontrib>Valente, F.</creatorcontrib><creatorcontrib>Bourlard, H.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Vijayasenan, D.</au><au>Valente, F.</au><au>Bourlard, H.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An Information Theoretic Approach to Speaker Diarization of Meeting Data</atitle><jtitle>IEEE transactions on audio, speech, and language processing</jtitle><stitle>TASL</stitle><date>2009-09-01</date><risdate>2009</risdate><volume>17</volume><issue>7</issue><spage>1382</spage><epage>1393</epage><pages>1382-1393</pages><issn>1558-7916</issn><issn>2329-9290</issn><eissn>1558-7924</eissn><eissn>2329-9304</eissn><coden>ITASD8</coden><abstract>A speaker diarization system based on an information theoretic framework is described. The problem is formulated according to the information bottleneck (IB) principle. Unlike other approaches where the distance between speaker segments is arbitrarily introduced, the IB method seeks the partition that maximizes the mutual information between observations and variables relevant for the problem while minimizing the distortion between observations. This solves the problem of choosing the distance between speech segments, which becomes the Jensen-Shannon divergence as it arises from the IB objective function optimization. We discuss issues related to speaker diarization using this information theoretic framework such as the criteria for inferring the number of speakers, the tradeoff between quality and compression achieved by the diarization system, and the algorithms for optimizing the objective function. Furthermore, we benchmark the proposed system against a state-of-the-art system on the NIST RT06 (rich transcription) data set for speaker diarization of meetings. The IB-based system achieves a diarization error rate of 23.2% compared to 23.6% for the baseline system. This approach being mainly based on nonparametric clustering, it runs significantly faster than the baseline HMM/GMM based system, resulting in faster-than-real-time diarization.</abstract><cop>Piscataway, NJ</cop><pub>IEEE</pub><doi>10.1109/TASL.2009.2015698</doi><tpages>12</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1558-7916
ispartof IEEE transactions on audio, speech, and language processing, 2009-09, Vol.17 (7), p.1382-1393
issn 1558-7916
2329-9290
1558-7924
2329-9304
language eng
recordid cdi_proquest_miscellaneous_36335533
source IEEE Electronic Library (IEL) Journals
subjects Acoustic distortion
Applied sciences
Automatic speech recognition
Error analysis
Exact sciences and technology
Hidden Markov models
Indexing
Information bottleneck (IB)
Information theory
Information, signal and communications theory
Loudspeakers
meetings data
Mutual information
NIST
Signal and communications theory
Signal processing
Signal representation. Spectral analysis
Signal, noise
speaker diarization
Speech processing
Streaming media
Studies
Telecommunications and information theory
Vocabulary
title An Information Theoretic Approach to Speaker Diarization of Meeting Data
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T14%3A48%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pasca&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20Information%20Theoretic%20Approach%20to%20Speaker%20Diarization%20of%20Meeting%20Data&rft.jtitle=IEEE%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Vijayasenan,%20D.&rft.date=2009-09-01&rft.volume=17&rft.issue=7&rft.spage=1382&rft.epage=1393&rft.pages=1382-1393&rft.issn=1558-7916&rft.eissn=1558-7924&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASL.2009.2015698&rft_dat=%3Cproquest_pasca%3E2568853311%3C/proquest_pasca%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c396t-abf3a63dbc08c02ed44d18ddd810114b2323e80e73631113c0abc04d1afb4ef93%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=917443948&rft_id=info:pmid/&rft_ieee_id=5165121&rfr_iscdi=true