Loading…

NewsQA: A Machine Comprehension Dataset

We present NewsQA, a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs. Crowdworkers supply questions and answers based on a set of over 10,000 news articles from CNN, with answers consisting of spans of text from the corresponding articles. We collect t...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2017-02
Main Authors: Trischler, Adam, Wang, Tong, Yuan, Xingdi, Harris, Justin, Sordoni, Alessandro, Bachman, Philip, Suleman, Kaheer
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Trischler, Adam
Wang, Tong
Yuan, Xingdi
Harris, Justin
Sordoni, Alessandro
Bachman, Philip
Suleman, Kaheer
description We present NewsQA, a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs. Crowdworkers supply questions and answers based on a set of over 10,000 news articles from CNN, with answers consisting of spans of text from the corresponding articles. We collect this dataset through a four-stage process designed to solicit exploratory questions that require reasoning. A thorough analysis confirms that NewsQA demands abilities beyond simple word matching and recognizing textual entailment. We measure human performance on the dataset and compare it to several strong neural models. The performance gap between humans and machines (0.198 in F1) indicates that significant progress can be made on NewsQA through future research. The dataset is freely available at https://datasets.maluuba.com/NewsQA.
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2074913146</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2074913146</sourcerecordid><originalsourceid>FETCH-proquest_journals_20749131463</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mRQ90stLw50tFJwVPBNTM7IzEtVcM7PLShKzUjNK87Mz1NwSSxJLE4t4WFgTUvMKU7lhdLcDMpuriHOHroFRfmFpanFJfFZ-aVFeUCpeCMDcxNLQ2NDEzNj4lQBAA6vLvU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2074913146</pqid></control><display><type>article</type><title>NewsQA: A Machine Comprehension Dataset</title><source>Publicly Available Content Database</source><creator>Trischler, Adam ; Wang, Tong ; Yuan, Xingdi ; Harris, Justin ; Sordoni, Alessandro ; Bachman, Philip ; Suleman, Kaheer</creator><creatorcontrib>Trischler, Adam ; Wang, Tong ; Yuan, Xingdi ; Harris, Justin ; Sordoni, Alessandro ; Bachman, Philip ; Suleman, Kaheer</creatorcontrib><description>We present NewsQA, a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs. Crowdworkers supply questions and answers based on a set of over 10,000 news articles from CNN, with answers consisting of spans of text from the corresponding articles. We collect this dataset through a four-stage process designed to solicit exploratory questions that require reasoning. A thorough analysis confirms that NewsQA demands abilities beyond simple word matching and recognizing textual entailment. We measure human performance on the dataset and compare it to several strong neural models. The performance gap between humans and machines (0.198 in F1) indicates that significant progress can be made on NewsQA through future research. The dataset is freely available at https://datasets.maluuba.com/NewsQA.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Datasets ; Human performance</subject><ispartof>arXiv.org, 2017-02</ispartof><rights>2017. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2074913146?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>776,780,25731,36989,44566</link.rule.ids></links><search><creatorcontrib>Trischler, Adam</creatorcontrib><creatorcontrib>Wang, Tong</creatorcontrib><creatorcontrib>Yuan, Xingdi</creatorcontrib><creatorcontrib>Harris, Justin</creatorcontrib><creatorcontrib>Sordoni, Alessandro</creatorcontrib><creatorcontrib>Bachman, Philip</creatorcontrib><creatorcontrib>Suleman, Kaheer</creatorcontrib><title>NewsQA: A Machine Comprehension Dataset</title><title>arXiv.org</title><description>We present NewsQA, a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs. Crowdworkers supply questions and answers based on a set of over 10,000 news articles from CNN, with answers consisting of spans of text from the corresponding articles. We collect this dataset through a four-stage process designed to solicit exploratory questions that require reasoning. A thorough analysis confirms that NewsQA demands abilities beyond simple word matching and recognizing textual entailment. We measure human performance on the dataset and compare it to several strong neural models. The performance gap between humans and machines (0.198 in F1) indicates that significant progress can be made on NewsQA through future research. The dataset is freely available at https://datasets.maluuba.com/NewsQA.</description><subject>Datasets</subject><subject>Human performance</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mRQ90stLw50tFJwVPBNTM7IzEtVcM7PLShKzUjNK87Mz1NwSSxJLE4t4WFgTUvMKU7lhdLcDMpuriHOHroFRfmFpanFJfFZ-aVFeUCpeCMDcxNLQ2NDEzNj4lQBAA6vLvU</recordid><startdate>20170207</startdate><enddate>20170207</enddate><creator>Trischler, Adam</creator><creator>Wang, Tong</creator><creator>Yuan, Xingdi</creator><creator>Harris, Justin</creator><creator>Sordoni, Alessandro</creator><creator>Bachman, Philip</creator><creator>Suleman, Kaheer</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20170207</creationdate><title>NewsQA: A Machine Comprehension Dataset</title><author>Trischler, Adam ; Wang, Tong ; Yuan, Xingdi ; Harris, Justin ; Sordoni, Alessandro ; Bachman, Philip ; Suleman, Kaheer</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_20749131463</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Datasets</topic><topic>Human performance</topic><toplevel>online_resources</toplevel><creatorcontrib>Trischler, Adam</creatorcontrib><creatorcontrib>Wang, Tong</creatorcontrib><creatorcontrib>Yuan, Xingdi</creatorcontrib><creatorcontrib>Harris, Justin</creatorcontrib><creatorcontrib>Sordoni, Alessandro</creatorcontrib><creatorcontrib>Bachman, Philip</creatorcontrib><creatorcontrib>Suleman, Kaheer</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Trischler, Adam</au><au>Wang, Tong</au><au>Yuan, Xingdi</au><au>Harris, Justin</au><au>Sordoni, Alessandro</au><au>Bachman, Philip</au><au>Suleman, Kaheer</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>NewsQA: A Machine Comprehension Dataset</atitle><jtitle>arXiv.org</jtitle><date>2017-02-07</date><risdate>2017</risdate><eissn>2331-8422</eissn><abstract>We present NewsQA, a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs. Crowdworkers supply questions and answers based on a set of over 10,000 news articles from CNN, with answers consisting of spans of text from the corresponding articles. We collect this dataset through a four-stage process designed to solicit exploratory questions that require reasoning. A thorough analysis confirms that NewsQA demands abilities beyond simple word matching and recognizing textual entailment. We measure human performance on the dataset and compare it to several strong neural models. The performance gap between humans and machines (0.198 in F1) indicates that significant progress can be made on NewsQA through future research. The dataset is freely available at https://datasets.maluuba.com/NewsQA.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2017-02
issn 2331-8422
language eng
recordid cdi_proquest_journals_2074913146
source Publicly Available Content Database
subjects Datasets
Human performance
title NewsQA: A Machine Comprehension Dataset
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T01%3A15%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=NewsQA:%20A%20Machine%20Comprehension%20Dataset&rft.jtitle=arXiv.org&rft.au=Trischler,%20Adam&rft.date=2017-02-07&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2074913146%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_20749131463%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2074913146&rft_id=info:pmid/&rfr_iscdi=true