Loading…
Overlapping Speech Detection Using Long-Term Conversational Features for Speaker Diarization in Meeting Room Conversations
Overlapping speech has been identified as one of the main sources of errors in diarization of meeting room conversations. Therefore, overlap detection has become an important step prior to speaker diarization. Studies on conversational analysis have shown that overlapping speech is more likely to oc...
Saved in:
Published in: | IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2014-12, Vol.22 (12), p.1688-1700 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Overlapping speech has been identified as one of the main sources of errors in diarization of meeting room conversations. Therefore, overlap detection has become an important step prior to speaker diarization. Studies on conversational analysis have shown that overlapping speech is more likely to occur at specific parts of a conversation. They have also shown that overlap occurrence is correlated with various conversational features such as speech, silence patterns and speaker turn changes. We use features capturing this higher level information from structure of a conversation such as silence and speaker change statistics to improve acoustic feature based classifier of overlapping and single-speaker speech classes. The silence and speaker change statistics are computed over a long-term window (around 3-4 seconds) and are used to predict the probability of overlap in the window. These estimates are then incorporated into a acoustic feature based classifier as prior probabilities of the classes. Experiments conducted on three corpora (AMI, NIST-RT and ICSI) have shown that the proposed method improves the performance of acoustic feature-based overlap detector on all the corpora. They also reveal that the model based on long-term conversational features used to estimate probability of overlap which is learned from AMI corpus generalizes to meetings from other corpora (NIST-RT and ICSI). Moreover, experiments on ICSI corpus reveal that the proposed method also improves laughter overlap detection. Consequently, applying overlap handling techniques to speaker diarization using the detected overlap results in reduction of diarization error rate (DER) on all the three corpora. |
---|---|
ISSN: | 2329-9290 2329-9304 |
DOI: | 10.1109/TASLP.2014.2346315 |