Loading…

Dialogue breakdown detection robust to variations in annotators and dialogue systems

•Different from previous studies that use a single label as the gold-standard to decide dialogue breakdown, we focused on inherent subjectivity when people regard dialogue breakdowns and variationality due to the design of conversational agents.•We propose methods for dialogue breakdown detection th...

Full description

Saved in:
Bibliographic Details
Published in:Computer speech & language 2019-03, Vol.54, p.31-43
Main Authors: Takayama, Junya, Nomoto, Eriko, Arase, Yuki
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Different from previous studies that use a single label as the gold-standard to decide dialogue breakdown, we focused on inherent subjectivity when people regard dialogue breakdowns and variationality due to the design of conversational agents.•We propose methods for dialogue breakdown detection that first cluster annotators based on their label distributions, and then ensemble detectors trained by different sets of annotators.•Two kinds of neural networks are used as detectors. We assume that breakdowns occur locally (word and phrase levels) and/or globally (sentence and context levels). The convolutional neural network (CNN) is applied to detect former type of breakdowns, and long short-term memory (LSTM) detects the latter type of breakdowns.•Our best model achieves 63.6% F-score on breakdown detection, which is 5.6% higher than the baseline using the conditional random fields.•This is an extended version of our workshop paper submitted to the Dialogue Breakdown Detection Challange 3 (DBDC3). We newly conducted additional experiments that confirm effectiveness of clustering annotators based on their annotation distributions. In addition, detailed error analysis revealed characteristics of the CNN and LSTM on the dialogue breakdown detection task. Dialogue breakdown is a significant problem in conversational agents. Timely breakdown detection helps the agents quickly recover from mistakes, minimizing the impact on user experience. In this paper, we focus on two problems: variations in determining a response that breakdowns a conversation i.e., subjectivity, and variations in breakdown types due to designs of conversational agents, i.e., variationality. To address the subjectivity, which decreases the agreement rate among annotators, our methods detect a dialogue breakdown by ensembling detectors trained by different sets of annotators that are grouped using a clustering algorithm. To address the variationality, our methods apply two types of detector architectures to capture global and local breakdowns. The long short-term memory detector considers the global context and the convolutional neural networks detector is sensitive to the local characteristics. The ensemble of all detectors makes a final judgment. The results of the Japanese task in the Dialogue Breakdown Detection Challenge 3 (DBDC3) confirm that our approach significantly outperforms the baseline, which uses the conventional conditional random field. Detailed error analysis reveals that our
ISSN:0885-2308
1095-8363
DOI:10.1016/j.csl.2018.08.007