Loading…

Using Vision and Speech Features for Automated Prediction of Performance Metrics in Multimodal Dialogs

Predicting and analyzing multimodal dialog user experience (UX) metrics, such as overall call experience, caller engagement, and latency, among other metrics, in an ongoing manner is important for evaluating such systems. We investigate automated prediction of multiple such metrics collected from cr...

Full description

Saved in:
Bibliographic Details
Published in:ETS research report series 2017-12, Vol.2017 (1), p.1-11
Main Authors: Ramanarayanan, Vikram, Lange, Patrick, Evanini, Keelan, Molloy, Hillary, Tsuprun, Eugene, Qian, Yao, Suendermann‐Oeft, David
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Predicting and analyzing multimodal dialog user experience (UX) metrics, such as overall call experience, caller engagement, and latency, among other metrics, in an ongoing manner is important for evaluating such systems. We investigate automated prediction of multiple such metrics collected from crowdsourced interactions with an open‐source, cloud‐based multimodal dialog system in the educational domain. We extract features from both the audio and video signals and examine the efficacy of multiple machine learning algorithms in predicting these performance metrics. The best performing audio features consist of multiple low‐level audio descriptors—intensity, loudness, cepstra, pitch, and so on—and their functionals, extracted using the OpenSMILE toolkit, while the video features are bags of visual words that use 3D Scale‐Invariant Feature Transform descriptors. We find that our proposed methods outperform the majority vote classification baseline in predicting various UX metrics rated by both the user and experts. Our results suggest that such automated prediction of performance metrics can not only inform the qualitative and quantitative analysis of dialogs but also be potentially incorporated into dialog management routines for positively impacting UX and other metrics during the course of the interaction. Report Number: ETS RR‐17–20
ISSN:2330-8516
2330-8516
DOI:10.1002/ets2.12146