Loading…

Speaker diarization method of telemarketer and client for improving speech dictation performance

Financial institutions employ speech dictation systems that convert the conversation recordings between telemarketer and client into the texts. The dictation system is necessary for checking incomplete sales, in which a telemarketer fails to provide important sales information to a client. However,...

Full description

Saved in:
Bibliographic Details
Published in:The Journal of supercomputing 2016-05, Vol.72 (5), p.1757-1769
Main Authors: Jung, Dahae, Bae, Min-Kyoung, Choi, Man Yong, Lee, Eui Chul, Joung, Jinoo
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Financial institutions employ speech dictation systems that convert the conversation recordings between telemarketer and client into the texts. The dictation system is necessary for checking incomplete sales, in which a telemarketer fails to provide important sales information to a client. However, the manually performed dictation procedure takes too much time and effort. Automatic speech dictation system is being adopted as an alternative. We suggest that, in such an automatic speech dictation system, a speaker diarization is performed prior to speech recognition. In this paper, we propose a diarization method based on pitch detection, which suits very well to given condition in which two speakers, telemarketer and client, make a conversation in a telephone recording. We suggest a method based on average short time spectral feature and unsupervised learning scheme. In the experiments, actual telephone recordings for insurance contraction were used. We obtained on average about 6 % of Diarization Error Rate (DER).
ISSN:0920-8542
1573-0484
DOI:10.1007/s11227-015-1470-4