Loading…
An Information Theoretic Approach to Speaker Diarization of Meeting Data
A speaker diarization system based on an information theoretic framework is described. The problem is formulated according to the information bottleneck (IB) principle. Unlike other approaches where the distance between speaker segments is arbitrarily introduced, the IB method seeks the partition th...
Saved in:
Published in: | IEEE transactions on audio, speech, and language processing speech, and language processing, 2009-09, Vol.17 (7), p.1382-1393 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c396t-abf3a63dbc08c02ed44d18ddd810114b2323e80e73631113c0abc04d1afb4ef93 |
---|---|
cites | cdi_FETCH-LOGICAL-c396t-abf3a63dbc08c02ed44d18ddd810114b2323e80e73631113c0abc04d1afb4ef93 |
container_end_page | 1393 |
container_issue | 7 |
container_start_page | 1382 |
container_title | IEEE transactions on audio, speech, and language processing |
container_volume | 17 |
creator | Vijayasenan, D. Valente, F. Bourlard, H. |
description | A speaker diarization system based on an information theoretic framework is described. The problem is formulated according to the information bottleneck (IB) principle. Unlike other approaches where the distance between speaker segments is arbitrarily introduced, the IB method seeks the partition that maximizes the mutual information between observations and variables relevant for the problem while minimizing the distortion between observations. This solves the problem of choosing the distance between speech segments, which becomes the Jensen-Shannon divergence as it arises from the IB objective function optimization. We discuss issues related to speaker diarization using this information theoretic framework such as the criteria for inferring the number of speakers, the tradeoff between quality and compression achieved by the diarization system, and the algorithms for optimizing the objective function. Furthermore, we benchmark the proposed system against a state-of-the-art system on the NIST RT06 (rich transcription) data set for speaker diarization of meetings. The IB-based system achieves a diarization error rate of 23.2% compared to 23.6% for the baseline system. This approach being mainly based on nonparametric clustering, it runs significantly faster than the baseline HMM/GMM based system, resulting in faster-than-real-time diarization. |
doi_str_mv | 10.1109/TASL.2009.2015698 |
format | article |
fullrecord | <record><control><sourceid>proquest_pasca</sourceid><recordid>TN_cdi_proquest_miscellaneous_36335533</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5165121</ieee_id><sourcerecordid>2568853311</sourcerecordid><originalsourceid>FETCH-LOGICAL-c396t-abf3a63dbc08c02ed44d18ddd810114b2323e80e73631113c0abc04d1afb4ef93</originalsourceid><addsrcrecordid>eNpd0EFLwzAUB_AgCs7pBxAvRdBbZ16TtumxbOoGEw-b55Cmr66za2rSHfTTm7Kxg5ckkN__8fgTcgt0AkCzp3W-Wk4iSjN_QJxk4oyMII5FmGYRPz-9IbkkV85tKeUs4TAi87wNFm1l7E71tWmD9QaNxb7WQd511ii9CXoTrDpUX2iDWa1s_XuQpgre0Mv2M5ipXl2Ti0o1Dm-O95h8vDyvp_Nw-f66mObLULMs6UNVVEwlrCw0FZpGWHJegijLUgAF4EXEIoaCYsoSBgBMU-WpN6oqOFYZG5PHw1y_3fceXS93tdPYNKpFs3fS51gcM-bh_T-4NXvb-t1kBinnLOPCIzggbY1zFivZ2Xqn7I8EKodi5VCsHIqVx2J95uE4WDmtmsqqVtfuFIwiKkTKBnd3cDUinr5jSGKIgP0B-IOASw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>917443948</pqid></control><display><type>article</type><title>An Information Theoretic Approach to Speaker Diarization of Meeting Data</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Vijayasenan, D. ; Valente, F. ; Bourlard, H.</creator><creatorcontrib>Vijayasenan, D. ; Valente, F. ; Bourlard, H.</creatorcontrib><description>A speaker diarization system based on an information theoretic framework is described. The problem is formulated according to the information bottleneck (IB) principle. Unlike other approaches where the distance between speaker segments is arbitrarily introduced, the IB method seeks the partition that maximizes the mutual information between observations and variables relevant for the problem while minimizing the distortion between observations. This solves the problem of choosing the distance between speech segments, which becomes the Jensen-Shannon divergence as it arises from the IB objective function optimization. We discuss issues related to speaker diarization using this information theoretic framework such as the criteria for inferring the number of speakers, the tradeoff between quality and compression achieved by the diarization system, and the algorithms for optimizing the objective function. Furthermore, we benchmark the proposed system against a state-of-the-art system on the NIST RT06 (rich transcription) data set for speaker diarization of meetings. The IB-based system achieves a diarization error rate of 23.2% compared to 23.6% for the baseline system. This approach being mainly based on nonparametric clustering, it runs significantly faster than the baseline HMM/GMM based system, resulting in faster-than-real-time diarization.</description><identifier>ISSN: 1558-7916</identifier><identifier>ISSN: 2329-9290</identifier><identifier>EISSN: 1558-7924</identifier><identifier>EISSN: 2329-9304</identifier><identifier>DOI: 10.1109/TASL.2009.2015698</identifier><identifier>CODEN: ITASD8</identifier><language>eng</language><publisher>Piscataway, NJ: IEEE</publisher><subject>Acoustic distortion ; Applied sciences ; Automatic speech recognition ; Error analysis ; Exact sciences and technology ; Hidden Markov models ; Indexing ; Information bottleneck (IB) ; Information theory ; Information, signal and communications theory ; Loudspeakers ; meetings data ; Mutual information ; NIST ; Signal and communications theory ; Signal processing ; Signal representation. Spectral analysis ; Signal, noise ; speaker diarization ; Speech processing ; Streaming media ; Studies ; Telecommunications and information theory ; Vocabulary</subject><ispartof>IEEE transactions on audio, speech, and language processing, 2009-09, Vol.17 (7), p.1382-1393</ispartof><rights>2009 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2009</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c396t-abf3a63dbc08c02ed44d18ddd810114b2323e80e73631113c0abc04d1afb4ef93</citedby><cites>FETCH-LOGICAL-c396t-abf3a63dbc08c02ed44d18ddd810114b2323e80e73631113c0abc04d1afb4ef93</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5165121$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=22088738$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Vijayasenan, D.</creatorcontrib><creatorcontrib>Valente, F.</creatorcontrib><creatorcontrib>Bourlard, H.</creatorcontrib><title>An Information Theoretic Approach to Speaker Diarization of Meeting Data</title><title>IEEE transactions on audio, speech, and language processing</title><addtitle>TASL</addtitle><description>A speaker diarization system based on an information theoretic framework is described. The problem is formulated according to the information bottleneck (IB) principle. Unlike other approaches where the distance between speaker segments is arbitrarily introduced, the IB method seeks the partition that maximizes the mutual information between observations and variables relevant for the problem while minimizing the distortion between observations. This solves the problem of choosing the distance between speech segments, which becomes the Jensen-Shannon divergence as it arises from the IB objective function optimization. We discuss issues related to speaker diarization using this information theoretic framework such as the criteria for inferring the number of speakers, the tradeoff between quality and compression achieved by the diarization system, and the algorithms for optimizing the objective function. Furthermore, we benchmark the proposed system against a state-of-the-art system on the NIST RT06 (rich transcription) data set for speaker diarization of meetings. The IB-based system achieves a diarization error rate of 23.2% compared to 23.6% for the baseline system. This approach being mainly based on nonparametric clustering, it runs significantly faster than the baseline HMM/GMM based system, resulting in faster-than-real-time diarization.</description><subject>Acoustic distortion</subject><subject>Applied sciences</subject><subject>Automatic speech recognition</subject><subject>Error analysis</subject><subject>Exact sciences and technology</subject><subject>Hidden Markov models</subject><subject>Indexing</subject><subject>Information bottleneck (IB)</subject><subject>Information theory</subject><subject>Information, signal and communications theory</subject><subject>Loudspeakers</subject><subject>meetings data</subject><subject>Mutual information</subject><subject>NIST</subject><subject>Signal and communications theory</subject><subject>Signal processing</subject><subject>Signal representation. Spectral analysis</subject><subject>Signal, noise</subject><subject>speaker diarization</subject><subject>Speech processing</subject><subject>Streaming media</subject><subject>Studies</subject><subject>Telecommunications and information theory</subject><subject>Vocabulary</subject><issn>1558-7916</issn><issn>2329-9290</issn><issn>1558-7924</issn><issn>2329-9304</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2009</creationdate><recordtype>article</recordtype><recordid>eNpd0EFLwzAUB_AgCs7pBxAvRdBbZ16TtumxbOoGEw-b55Cmr66za2rSHfTTm7Kxg5ckkN__8fgTcgt0AkCzp3W-Wk4iSjN_QJxk4oyMII5FmGYRPz-9IbkkV85tKeUs4TAi87wNFm1l7E71tWmD9QaNxb7WQd511ii9CXoTrDpUX2iDWa1s_XuQpgre0Mv2M5ipXl2Ti0o1Dm-O95h8vDyvp_Nw-f66mObLULMs6UNVVEwlrCw0FZpGWHJegijLUgAF4EXEIoaCYsoSBgBMU-WpN6oqOFYZG5PHw1y_3fceXS93tdPYNKpFs3fS51gcM-bh_T-4NXvb-t1kBinnLOPCIzggbY1zFivZ2Xqn7I8EKodi5VCsHIqVx2J95uE4WDmtmsqqVtfuFIwiKkTKBnd3cDUinr5jSGKIgP0B-IOASw</recordid><startdate>20090901</startdate><enddate>20090901</enddate><creator>Vijayasenan, D.</creator><creator>Valente, F.</creator><creator>Bourlard, H.</creator><general>IEEE</general><general>Institute of Electrical and Electronics Engineers</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20090901</creationdate><title>An Information Theoretic Approach to Speaker Diarization of Meeting Data</title><author>Vijayasenan, D. ; Valente, F. ; Bourlard, H.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c396t-abf3a63dbc08c02ed44d18ddd810114b2323e80e73631113c0abc04d1afb4ef93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Acoustic distortion</topic><topic>Applied sciences</topic><topic>Automatic speech recognition</topic><topic>Error analysis</topic><topic>Exact sciences and technology</topic><topic>Hidden Markov models</topic><topic>Indexing</topic><topic>Information bottleneck (IB)</topic><topic>Information theory</topic><topic>Information, signal and communications theory</topic><topic>Loudspeakers</topic><topic>meetings data</topic><topic>Mutual information</topic><topic>NIST</topic><topic>Signal and communications theory</topic><topic>Signal processing</topic><topic>Signal representation. Spectral analysis</topic><topic>Signal, noise</topic><topic>speaker diarization</topic><topic>Speech processing</topic><topic>Streaming media</topic><topic>Studies</topic><topic>Telecommunications and information theory</topic><topic>Vocabulary</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Vijayasenan, D.</creatorcontrib><creatorcontrib>Valente, F.</creatorcontrib><creatorcontrib>Bourlard, H.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on audio, speech, and language processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Vijayasenan, D.</au><au>Valente, F.</au><au>Bourlard, H.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An Information Theoretic Approach to Speaker Diarization of Meeting Data</atitle><jtitle>IEEE transactions on audio, speech, and language processing</jtitle><stitle>TASL</stitle><date>2009-09-01</date><risdate>2009</risdate><volume>17</volume><issue>7</issue><spage>1382</spage><epage>1393</epage><pages>1382-1393</pages><issn>1558-7916</issn><issn>2329-9290</issn><eissn>1558-7924</eissn><eissn>2329-9304</eissn><coden>ITASD8</coden><abstract>A speaker diarization system based on an information theoretic framework is described. The problem is formulated according to the information bottleneck (IB) principle. Unlike other approaches where the distance between speaker segments is arbitrarily introduced, the IB method seeks the partition that maximizes the mutual information between observations and variables relevant for the problem while minimizing the distortion between observations. This solves the problem of choosing the distance between speech segments, which becomes the Jensen-Shannon divergence as it arises from the IB objective function optimization. We discuss issues related to speaker diarization using this information theoretic framework such as the criteria for inferring the number of speakers, the tradeoff between quality and compression achieved by the diarization system, and the algorithms for optimizing the objective function. Furthermore, we benchmark the proposed system against a state-of-the-art system on the NIST RT06 (rich transcription) data set for speaker diarization of meetings. The IB-based system achieves a diarization error rate of 23.2% compared to 23.6% for the baseline system. This approach being mainly based on nonparametric clustering, it runs significantly faster than the baseline HMM/GMM based system, resulting in faster-than-real-time diarization.</abstract><cop>Piscataway, NJ</cop><pub>IEEE</pub><doi>10.1109/TASL.2009.2015698</doi><tpages>12</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1558-7916 |
ispartof | IEEE transactions on audio, speech, and language processing, 2009-09, Vol.17 (7), p.1382-1393 |
issn | 1558-7916 2329-9290 1558-7924 2329-9304 |
language | eng |
recordid | cdi_proquest_miscellaneous_36335533 |
source | IEEE Electronic Library (IEL) Journals |
subjects | Acoustic distortion Applied sciences Automatic speech recognition Error analysis Exact sciences and technology Hidden Markov models Indexing Information bottleneck (IB) Information theory Information, signal and communications theory Loudspeakers meetings data Mutual information NIST Signal and communications theory Signal processing Signal representation. Spectral analysis Signal, noise speaker diarization Speech processing Streaming media Studies Telecommunications and information theory Vocabulary |
title | An Information Theoretic Approach to Speaker Diarization of Meeting Data |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T14%3A48%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pasca&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20Information%20Theoretic%20Approach%20to%20Speaker%20Diarization%20of%20Meeting%20Data&rft.jtitle=IEEE%20transactions%20on%20audio,%20speech,%20and%20language%20processing&rft.au=Vijayasenan,%20D.&rft.date=2009-09-01&rft.volume=17&rft.issue=7&rft.spage=1382&rft.epage=1393&rft.pages=1382-1393&rft.issn=1558-7916&rft.eissn=1558-7924&rft.coden=ITASD8&rft_id=info:doi/10.1109/TASL.2009.2015698&rft_dat=%3Cproquest_pasca%3E2568853311%3C/proquest_pasca%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c396t-abf3a63dbc08c02ed44d18ddd810114b2323e80e73631113c0abc04d1afb4ef93%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=917443948&rft_id=info:pmid/&rft_ieee_id=5165121&rfr_iscdi=true |