Loading…
Beyond Class A: A Proposal for Automatic Evaluation of Discourse
The DARPA Spoken Language community has just completed the first trial evaluation of spontaneous query/response pairs in the Air Travel (ATIS) domain. Our goal has been to find a methodology for evaluating correct responses to user queries. To this end, we agreed, for the first trial evaluation, to...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Report |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Hirschman, Lynette Dahl, Deborah A McKay, Donald P Norton, Lewis M Linebarger, Marcia C |
description | The DARPA Spoken Language community has just completed the first trial evaluation of spontaneous query/response pairs in the Air Travel (ATIS) domain. Our goal has been to find a methodology for evaluating correct responses to user queries. To this end, we agreed, for the first trial evaluation, to constrain the problem in several ways: Database Application: Constrain the application to a database query application, to ease the burden of a) constructing the back-end, and b) determining correct responses; Canonical Answer: Constrain answer comparison to a minimal canonical answer that imposes the fewest constraints on the form of system response displayed to a user at each site; Typed Input: Constrain the evaluation to typed input only; Class A: Constrain the test set to single unambiguous intelligible utterances taken without context that have well-defined database answers (class A sentences). These were reasonable constraints to impose on the first trial evaluation. However, it is clear that we need to loosen these constraints to obtain a more realistic evaluation of spoken language systems. The purpose of this paper is to suggest how we can move beyond evaluation of class A sentences to an evaluation of connected dialogue, including out-of-domain queries.
Sponsored in part by DARPA. |
format | report |
fullrecord | <record><control><sourceid>dtic_1RU</sourceid><recordid>TN_cdi_dtic_stinet_ADA458704</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>ADA458704</sourcerecordid><originalsourceid>FETCH-dtic_stinet_ADA4587043</originalsourceid><addsrcrecordid>eNrjZHBwSq3Mz0tRcM5JLC5WcLRScFQIKMovyC9OzFFIyy9ScCwtyc9NLMlMVnAtS8wpBbLy8xTy0xRcMouT80uLilN5GFjTEnOKU3mhNDeDjJtriLOHbgpQU3xxSWZeakm8o4ujiamFuYGJMQFpAFZALWo</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>report</recordtype></control><display><type>report</type><title>Beyond Class A: A Proposal for Automatic Evaluation of Discourse</title><source>DTIC Technical Reports</source><creator>Hirschman, Lynette ; Dahl, Deborah A ; McKay, Donald P ; Norton, Lewis M ; Linebarger, Marcia C</creator><creatorcontrib>Hirschman, Lynette ; Dahl, Deborah A ; McKay, Donald P ; Norton, Lewis M ; Linebarger, Marcia C ; UNISYS DEFENSE SYSTEMS PAOLI PA</creatorcontrib><description>The DARPA Spoken Language community has just completed the first trial evaluation of spontaneous query/response pairs in the Air Travel (ATIS) domain. Our goal has been to find a methodology for evaluating correct responses to user queries. To this end, we agreed, for the first trial evaluation, to constrain the problem in several ways: Database Application: Constrain the application to a database query application, to ease the burden of a) constructing the back-end, and b) determining correct responses; Canonical Answer: Constrain answer comparison to a minimal canonical answer that imposes the fewest constraints on the form of system response displayed to a user at each site; Typed Input: Constrain the evaluation to typed input only; Class A: Constrain the test set to single unambiguous intelligible utterances taken without context that have well-defined database answers (class A sentences). These were reasonable constraints to impose on the first trial evaluation. However, it is clear that we need to loosen these constraints to obtain a more realistic evaluation of spoken language systems. The purpose of this paper is to suggest how we can move beyond evaluation of class A sentences to an evaluation of connected dialogue, including out-of-domain queries.
Sponsored in part by DARPA.</description><language>eng</language><subject>AUTOMATION ; COMMUNITIES ; DATA BASES ; INPUT ; INTERROGATION ; LANGUAGE ; RESPONSE ; SPEECH ; TEST SETS ; USER NEEDS ; Voice Communications ; WORDS(LANGUAGE)</subject><creationdate>1990</creationdate><rights>Approved for public release; distribution is unlimited.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,780,885,27566,27567</link.rule.ids><linktorsrc>$$Uhttps://apps.dtic.mil/sti/citations/ADA458704$$EView_record_in_DTIC$$FView_record_in_$$GDTIC$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Hirschman, Lynette</creatorcontrib><creatorcontrib>Dahl, Deborah A</creatorcontrib><creatorcontrib>McKay, Donald P</creatorcontrib><creatorcontrib>Norton, Lewis M</creatorcontrib><creatorcontrib>Linebarger, Marcia C</creatorcontrib><creatorcontrib>UNISYS DEFENSE SYSTEMS PAOLI PA</creatorcontrib><title>Beyond Class A: A Proposal for Automatic Evaluation of Discourse</title><description>The DARPA Spoken Language community has just completed the first trial evaluation of spontaneous query/response pairs in the Air Travel (ATIS) domain. Our goal has been to find a methodology for evaluating correct responses to user queries. To this end, we agreed, for the first trial evaluation, to constrain the problem in several ways: Database Application: Constrain the application to a database query application, to ease the burden of a) constructing the back-end, and b) determining correct responses; Canonical Answer: Constrain answer comparison to a minimal canonical answer that imposes the fewest constraints on the form of system response displayed to a user at each site; Typed Input: Constrain the evaluation to typed input only; Class A: Constrain the test set to single unambiguous intelligible utterances taken without context that have well-defined database answers (class A sentences). These were reasonable constraints to impose on the first trial evaluation. However, it is clear that we need to loosen these constraints to obtain a more realistic evaluation of spoken language systems. The purpose of this paper is to suggest how we can move beyond evaluation of class A sentences to an evaluation of connected dialogue, including out-of-domain queries.
Sponsored in part by DARPA.</description><subject>AUTOMATION</subject><subject>COMMUNITIES</subject><subject>DATA BASES</subject><subject>INPUT</subject><subject>INTERROGATION</subject><subject>LANGUAGE</subject><subject>RESPONSE</subject><subject>SPEECH</subject><subject>TEST SETS</subject><subject>USER NEEDS</subject><subject>Voice Communications</subject><subject>WORDS(LANGUAGE)</subject><fulltext>true</fulltext><rsrctype>report</rsrctype><creationdate>1990</creationdate><recordtype>report</recordtype><sourceid>1RU</sourceid><recordid>eNrjZHBwSq3Mz0tRcM5JLC5WcLRScFQIKMovyC9OzFFIyy9ScCwtyc9NLMlMVnAtS8wpBbLy8xTy0xRcMouT80uLilN5GFjTEnOKU3mhNDeDjJtriLOHbgpQU3xxSWZeakm8o4ujiamFuYGJMQFpAFZALWo</recordid><startdate>199001</startdate><enddate>199001</enddate><creator>Hirschman, Lynette</creator><creator>Dahl, Deborah A</creator><creator>McKay, Donald P</creator><creator>Norton, Lewis M</creator><creator>Linebarger, Marcia C</creator><scope>1RU</scope><scope>BHM</scope></search><sort><creationdate>199001</creationdate><title>Beyond Class A: A Proposal for Automatic Evaluation of Discourse</title><author>Hirschman, Lynette ; Dahl, Deborah A ; McKay, Donald P ; Norton, Lewis M ; Linebarger, Marcia C</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-dtic_stinet_ADA4587043</frbrgroupid><rsrctype>reports</rsrctype><prefilter>reports</prefilter><language>eng</language><creationdate>1990</creationdate><topic>AUTOMATION</topic><topic>COMMUNITIES</topic><topic>DATA BASES</topic><topic>INPUT</topic><topic>INTERROGATION</topic><topic>LANGUAGE</topic><topic>RESPONSE</topic><topic>SPEECH</topic><topic>TEST SETS</topic><topic>USER NEEDS</topic><topic>Voice Communications</topic><topic>WORDS(LANGUAGE)</topic><toplevel>online_resources</toplevel><creatorcontrib>Hirschman, Lynette</creatorcontrib><creatorcontrib>Dahl, Deborah A</creatorcontrib><creatorcontrib>McKay, Donald P</creatorcontrib><creatorcontrib>Norton, Lewis M</creatorcontrib><creatorcontrib>Linebarger, Marcia C</creatorcontrib><creatorcontrib>UNISYS DEFENSE SYSTEMS PAOLI PA</creatorcontrib><collection>DTIC Technical Reports</collection><collection>DTIC STINET</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Hirschman, Lynette</au><au>Dahl, Deborah A</au><au>McKay, Donald P</au><au>Norton, Lewis M</au><au>Linebarger, Marcia C</au><aucorp>UNISYS DEFENSE SYSTEMS PAOLI PA</aucorp><format>book</format><genre>unknown</genre><ristype>RPRT</ristype><btitle>Beyond Class A: A Proposal for Automatic Evaluation of Discourse</btitle><date>1990-01</date><risdate>1990</risdate><abstract>The DARPA Spoken Language community has just completed the first trial evaluation of spontaneous query/response pairs in the Air Travel (ATIS) domain. Our goal has been to find a methodology for evaluating correct responses to user queries. To this end, we agreed, for the first trial evaluation, to constrain the problem in several ways: Database Application: Constrain the application to a database query application, to ease the burden of a) constructing the back-end, and b) determining correct responses; Canonical Answer: Constrain answer comparison to a minimal canonical answer that imposes the fewest constraints on the form of system response displayed to a user at each site; Typed Input: Constrain the evaluation to typed input only; Class A: Constrain the test set to single unambiguous intelligible utterances taken without context that have well-defined database answers (class A sentences). These were reasonable constraints to impose on the first trial evaluation. However, it is clear that we need to loosen these constraints to obtain a more realistic evaluation of spoken language systems. The purpose of this paper is to suggest how we can move beyond evaluation of class A sentences to an evaluation of connected dialogue, including out-of-domain queries.
Sponsored in part by DARPA.</abstract><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | |
ispartof | |
issn | |
language | eng |
recordid | cdi_dtic_stinet_ADA458704 |
source | DTIC Technical Reports |
subjects | AUTOMATION COMMUNITIES DATA BASES INPUT INTERROGATION LANGUAGE RESPONSE SPEECH TEST SETS USER NEEDS Voice Communications WORDS(LANGUAGE) |
title | Beyond Class A: A Proposal for Automatic Evaluation of Discourse |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T17%3A28%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-dtic_1RU&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=unknown&rft.btitle=Beyond%20Class%20A:%20A%20Proposal%20for%20Automatic%20Evaluation%20of%20Discourse&rft.au=Hirschman,%20Lynette&rft.aucorp=UNISYS%20DEFENSE%20SYSTEMS%20PAOLI%20PA&rft.date=1990-01&rft_id=info:doi/&rft_dat=%3Cdtic_1RU%3EADA458704%3C/dtic_1RU%3E%3Cgrp_id%3Ecdi_FETCH-dtic_stinet_ADA4587043%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |