Loading…

Speaker diarization of meetings based on large TDOA feature vectors

This paper investigates the use of large TDOA feature vectors together with acoustic information in speaker diarization of meetings. TDOAs are obtained by considering all possible microphones pairs and this approach is compared with conventional TDOA features extracted w.r.t. a reference channel. Th...

Full description

Saved in:

Bibliographic Details
Main Authors:	Vijayasenan, D., Valente, F.
Format:	Conference Proceeding
Language:	English
Subjects:	Acoustics Delay Hidden Markov models Meetings Recordings Microphones Model combination NIST Speaker diarization Speech Time Delay Of Arrival features Vectors
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	4176
container_issue
container_start_page	4173
container_title
container_volume
creator	Vijayasenan, D. Valente, F.
description	This paper investigates the use of large TDOA feature vectors together with acoustic information in speaker diarization of meetings. TDOAs are obtained by considering all possible microphones pairs and this approach is compared with conventional TDOA features extracted w.r.t. a reference channel. The study is carried using two systems, the first based on Gaussian Mixture Modeling and the second based on the Information Bottleneck approach. Results on NIST RT06/RT07/RT09 evaluation datasets show a large speaker error reduction of 30% relative going from 14.3% to 10.8% for the first and from 12.3% to 8.2% for the second whenever the feature weighting is properly handled. Furthermore results reveal that the IB system is more robust to different number of microphones even when all pairs large TDOA vectors are used thus outperforming the HMM/GMM by 25% relative (8.2% error compared to 10.8%).
doi_str_mv	10.1109/ICASSP.2012.6288838
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6288838</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6288838</ieee_id><sourcerecordid>6288838</sourcerecordid><originalsourceid>FETCH-LOGICAL-i220t-edc818a254d5b1655626ee252501c96de677fab35c75f8b8f6bbf96ccbe6f56a3</originalsourceid><addsrcrecordid>eNo1UG1LwzAYjG_gnP0F-5I_0JqX5mnycVSnwmBCJ_htJO2TEd3WkVRBf70F5305uOOO4wiZcVZwzszdcz1vmpdCMC4KEFprqc9IZirNS6gkYyWYczIRsjI5N-ztgtz8G6q8JBOuBMuBl-aaZCm9sxFjlEmYkLo5ov3ASLtgY_ixQ-gPtPd0jziEwzZRZxN2dBR3Nm6Rru9Xc-rRDp8R6Re2Qx_TLbnydpcwO_GUvC4e1vVTvlw9jsuXeRCCDTl2rebaClV2ynFQCgQgCiUU462BDqGqvHVStZXy2mkPznkDbesQvAIrp2T21xsQcXOMYW_j9-b0h_wFv71QJg</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Speaker diarization of meetings based on large TDOA feature vectors</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Vijayasenan, D. ; Valente, F.</creator><creatorcontrib>Vijayasenan, D. ; Valente, F.</creatorcontrib><description>This paper investigates the use of large TDOA feature vectors together with acoustic information in speaker diarization of meetings. TDOAs are obtained by considering all possible microphones pairs and this approach is compared with conventional TDOA features extracted w.r.t. a reference channel. The study is carried using two systems, the first based on Gaussian Mixture Modeling and the second based on the Information Bottleneck approach. Results on NIST RT06/RT07/RT09 evaluation datasets show a large speaker error reduction of 30% relative going from 14.3% to 10.8% for the first and from 12.3% to 8.2% for the second whenever the feature weighting is properly handled. Furthermore results reveal that the IB system is more robust to different number of microphones even when all pairs large TDOA vectors are used thus outperforming the HMM/GMM by 25% relative (8.2% error compared to 10.8%).</description><identifier>ISSN: 1520-6149</identifier><identifier>ISBN: 1467300454</identifier><identifier>ISBN: 9781467300452</identifier><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 9781467300469</identifier><identifier>EISBN: 1467300446</identifier><identifier>EISBN: 9781467300445</identifier><identifier>EISBN: 1467300462</identifier><identifier>DOI: 10.1109/ICASSP.2012.6288838</identifier><language>eng</language><publisher>IEEE</publisher><subject>Acoustics ; Delay ; Hidden Markov models ; Meetings Recordings ; Microphones ; Model combination ; NIST ; Speaker diarization ; Speech ; Time Delay Of Arrival features ; Vectors</subject><ispartof>2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012, p.4173-4176</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6288838$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54555,54920,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6288838$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Vijayasenan, D.</creatorcontrib><creatorcontrib>Valente, F.</creatorcontrib><title>Speaker diarization of meetings based on large TDOA feature vectors</title><title>2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</title><addtitle>ICASSP</addtitle><description>This paper investigates the use of large TDOA feature vectors together with acoustic information in speaker diarization of meetings. TDOAs are obtained by considering all possible microphones pairs and this approach is compared with conventional TDOA features extracted w.r.t. a reference channel. The study is carried using two systems, the first based on Gaussian Mixture Modeling and the second based on the Information Bottleneck approach. Results on NIST RT06/RT07/RT09 evaluation datasets show a large speaker error reduction of 30% relative going from 14.3% to 10.8% for the first and from 12.3% to 8.2% for the second whenever the feature weighting is properly handled. Furthermore results reveal that the IB system is more robust to different number of microphones even when all pairs large TDOA vectors are used thus outperforming the HMM/GMM by 25% relative (8.2% error compared to 10.8%).</description><subject>Acoustics</subject><subject>Delay</subject><subject>Hidden Markov models</subject><subject>Meetings Recordings</subject><subject>Microphones</subject><subject>Model combination</subject><subject>NIST</subject><subject>Speaker diarization</subject><subject>Speech</subject><subject>Time Delay Of Arrival features</subject><subject>Vectors</subject><issn>1520-6149</issn><issn>2379-190X</issn><isbn>1467300454</isbn><isbn>9781467300452</isbn><isbn>9781467300469</isbn><isbn>1467300446</isbn><isbn>9781467300445</isbn><isbn>1467300462</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2012</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNo1UG1LwzAYjG_gnP0F-5I_0JqX5mnycVSnwmBCJ_htJO2TEd3WkVRBf70F5305uOOO4wiZcVZwzszdcz1vmpdCMC4KEFprqc9IZirNS6gkYyWYczIRsjI5N-ztgtz8G6q8JBOuBMuBl-aaZCm9sxFjlEmYkLo5ov3ASLtgY_ixQ-gPtPd0jziEwzZRZxN2dBR3Nm6Rru9Xc-rRDp8R6Re2Qx_TLbnydpcwO_GUvC4e1vVTvlw9jsuXeRCCDTl2rebaClV2ynFQCgQgCiUU462BDqGqvHVStZXy2mkPznkDbesQvAIrp2T21xsQcXOMYW_j9-b0h_wFv71QJg</recordid><startdate>20120101</startdate><enddate>20120101</enddate><creator>Vijayasenan, D.</creator><creator>Valente, F.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>20120101</creationdate><title>Speaker diarization of meetings based on large TDOA feature vectors</title><author>Vijayasenan, D. ; Valente, F.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i220t-edc818a254d5b1655626ee252501c96de677fab35c75f8b8f6bbf96ccbe6f56a3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Acoustics</topic><topic>Delay</topic><topic>Hidden Markov models</topic><topic>Meetings Recordings</topic><topic>Microphones</topic><topic>Model combination</topic><topic>NIST</topic><topic>Speaker diarization</topic><topic>Speech</topic><topic>Time Delay Of Arrival features</topic><topic>Vectors</topic><toplevel>online_resources</toplevel><creatorcontrib>Vijayasenan, D.</creatorcontrib><creatorcontrib>Valente, F.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Vijayasenan, D.</au><au>Valente, F.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Speaker diarization of meetings based on large TDOA feature vectors</atitle><btitle>2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</btitle><stitle>ICASSP</stitle><date>2012-01-01</date><risdate>2012</risdate><spage>4173</spage><epage>4176</epage><pages>4173-4176</pages><issn>1520-6149</issn><eissn>2379-190X</eissn><isbn>1467300454</isbn><isbn>9781467300452</isbn><eisbn>9781467300469</eisbn><eisbn>1467300446</eisbn><eisbn>9781467300445</eisbn><eisbn>1467300462</eisbn><abstract>This paper investigates the use of large TDOA feature vectors together with acoustic information in speaker diarization of meetings. TDOAs are obtained by considering all possible microphones pairs and this approach is compared with conventional TDOA features extracted w.r.t. a reference channel. The study is carried using two systems, the first based on Gaussian Mixture Modeling and the second based on the Information Bottleneck approach. Results on NIST RT06/RT07/RT09 evaluation datasets show a large speaker error reduction of 30% relative going from 14.3% to 10.8% for the first and from 12.3% to 8.2% for the second whenever the feature weighting is properly handled. Furthermore results reveal that the IB system is more robust to different number of microphones even when all pairs large TDOA vectors are used thus outperforming the HMM/GMM by 25% relative (8.2% error compared to 10.8%).</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2012.6288838</doi><tpages>4</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1520-6149
ispartof	2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012, p.4173-4176
issn	1520-6149 2379-190X
language	eng
recordid	cdi_ieee_primary_6288838
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Acoustics Delay Hidden Markov models Meetings Recordings Microphones Model combination NIST Speaker diarization Speech Time Delay Of Arrival features Vectors
title	Speaker diarization of meetings based on large TDOA feature vectors
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T14%3A59%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Speaker%20diarization%20of%20meetings%20based%20on%20large%20TDOA%20feature%20vectors&rft.btitle=2012%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing%20(ICASSP)&rft.au=Vijayasenan,%20D.&rft.date=2012-01-01&rft.spage=4173&rft.epage=4176&rft.pages=4173-4176&rft.issn=1520-6149&rft.eissn=2379-190X&rft.isbn=1467300454&rft.isbn_list=9781467300452&rft_id=info:doi/10.1109/ICASSP.2012.6288838&rft.eisbn=9781467300469&rft.eisbn_list=1467300446&rft.eisbn_list=9781467300445&rft.eisbn_list=1467300462&rft_dat=%3Cieee_6IE%3E6288838%3C/ieee_6IE%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i220t-edc818a254d5b1655626ee252501c96de677fab35c75f8b8f6bbf96ccbe6f56a3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6288838&rfr_iscdi=true