Loading…
Beyond Class A: A Proposal for Automatic Evaluation of Discourse
The DARPA Spoken Language community has just completed the first trial evaluation of spontaneous query/response pairs in the Air Travel (ATIS) domain. Our goal has been to find a methodology for evaluating correct responses to user queries. To this end, we agreed, for the first trial evaluation, to...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Report |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The DARPA Spoken Language community has just completed the first trial evaluation of spontaneous query/response pairs in the Air Travel (ATIS) domain. Our goal has been to find a methodology for evaluating correct responses to user queries. To this end, we agreed, for the first trial evaluation, to constrain the problem in several ways: Database Application: Constrain the application to a database query application, to ease the burden of a) constructing the back-end, and b) determining correct responses; Canonical Answer: Constrain answer comparison to a minimal canonical answer that imposes the fewest constraints on the form of system response displayed to a user at each site; Typed Input: Constrain the evaluation to typed input only; Class A: Constrain the test set to single unambiguous intelligible utterances taken without context that have well-defined database answers (class A sentences). These were reasonable constraints to impose on the first trial evaluation. However, it is clear that we need to loosen these constraints to obtain a more realistic evaluation of spoken language systems. The purpose of this paper is to suggest how we can move beyond evaluation of class A sentences to an evaluation of connected dialogue, including out-of-domain queries.
Sponsored in part by DARPA. |
---|