Loading…
Analysis of statistical question classification for fact-based questions
Question classification systems play an important role in question answering systems and can be used in a wide range of other domains. The goal of question classification is to accurately assign labels to questions based on expected answer type. Most approaches in the past have relied on matching qu...
Saved in:
Published in: | Information retrieval (Boston) 2005-01, Vol.8 (3), p.481-504 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c404t-f137205966e9bd6244b805495fb9c9e3327303fa25133555a1fa40ebef80ee653 |
---|---|
cites | cdi_FETCH-LOGICAL-c404t-f137205966e9bd6244b805495fb9c9e3327303fa25133555a1fa40ebef80ee653 |
container_end_page | 504 |
container_issue | 3 |
container_start_page | 481 |
container_title | Information retrieval (Boston) |
container_volume | 8 |
creator | METZLER, Donald CROFT, W. Bruce |
description | Question classification systems play an important role in question answering systems and can be used in a wide range of other domains. The goal of question classification is to accurately assign labels to questions based on expected answer type. Most approaches in the past have relied on matching questions against hand-crafted rules. However, rules require laborious effort to create and often suffer from being too specific. Statistical question classification methods overcome these issues by employing machine learning techniques. We empirically show that a statistical approach is robust and achieves good performance on three diverse data sets with little or no hand tuning. Furthermore, we examine the role different syntactic and semantic features have on performance. We find that semantic features tend to increase performance more than purely syntactic features. Finally, we analyze common causes of misclassification error and provide insight into ways they may be overcome. |
doi_str_mv | 10.1007/s10791-005-6995-3 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_35009184</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>824380491</sourcerecordid><originalsourceid>FETCH-LOGICAL-c404t-f137205966e9bd6244b805495fb9c9e3327303fa25133555a1fa40ebef80ee653</originalsourceid><addsrcrecordid>eNp1kE1LAzEQhoMoWD9-gLdF0Vt0sslkN8dS1AoFL3oO2TSBLdtuzWwP_femHygInjIJz7wveRi7EfAoAKonElAZwQGQa2OQyxM2ElhJXmk0p3mWteYKtTpnF0QLANBKmRGbjleu21JLRR8LGtzQ0tB61xVfm5CnflX4zhG1MT_ur7FPRXR-4I2jMP_B6IqdRddRuD6el-zz5fljMuWz99e3yXjGvQI18ChkVQIarYNp5rpUqqkBlcHYGG-ClGUlQUZXopASEZ2ITkFoQqwhBI3ykj0cctep35fbZUs-dJ1bhX5DViKAEbXK4N0fcNFvUv4t2dJAbbDGSmfq9l9KgpZCql2nOEA-9UQpRLtO7dKlrRVgd_rtQb_N-u1Ov5V55_4Y7Cj7jMmtfEu_izr3lzn7G506g3M</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>230631345</pqid></control><display><type>article</type><title>Analysis of statistical question classification for fact-based questions</title><source>ABI/INFORM Collection</source><source>Springer Link</source><creator>METZLER, Donald ; CROFT, W. Bruce</creator><creatorcontrib>METZLER, Donald ; CROFT, W. Bruce</creatorcontrib><description>Question classification systems play an important role in question answering systems and can be used in a wide range of other domains. The goal of question classification is to accurately assign labels to questions based on expected answer type. Most approaches in the past have relied on matching questions against hand-crafted rules. However, rules require laborious effort to create and often suffer from being too specific. Statistical question classification methods overcome these issues by employing machine learning techniques. We empirically show that a statistical approach is robust and achieves good performance on three diverse data sets with little or no hand tuning. Furthermore, we examine the role different syntactic and semantic features have on performance. We find that semantic features tend to increase performance more than purely syntactic features. Finally, we analyze common causes of misclassification error and provide insight into ways they may be overcome.</description><identifier>ISSN: 1386-4564</identifier><identifier>EISSN: 1573-7659</identifier><identifier>DOI: 10.1007/s10791-005-6995-3</identifier><language>eng</language><publisher>Dordrecht: Springer</publisher><subject>Artificial intelligence ; Classification ; Error analysis ; Exact sciences and technology ; Information and communication sciences ; Information processing and retrieval ; Information retrieval ; Information retrieval. Man machine relationship ; Information science. Documentation ; Machine learning ; Questions ; Research process. Evaluation ; Sciences and techniques of general use ; Semantics ; Statistical process control ; Studies</subject><ispartof>Information retrieval (Boston), 2005-01, Vol.8 (3), p.481-504</ispartof><rights>2005 INIST-CNRS</rights><rights>Springer Science + Business Media, Inc. 2005</rights><rights>Springer Science + Business Media, Inc. 2005.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c404t-f137205966e9bd6244b805495fb9c9e3327303fa25133555a1fa40ebef80ee653</citedby><cites>FETCH-LOGICAL-c404t-f137205966e9bd6244b805495fb9c9e3327303fa25133555a1fa40ebef80ee653</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2908958576/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2908958576?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,11688,27924,27925,36060,36061,44363,74895</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=16585245$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>METZLER, Donald</creatorcontrib><creatorcontrib>CROFT, W. Bruce</creatorcontrib><title>Analysis of statistical question classification for fact-based questions</title><title>Information retrieval (Boston)</title><description>Question classification systems play an important role in question answering systems and can be used in a wide range of other domains. The goal of question classification is to accurately assign labels to questions based on expected answer type. Most approaches in the past have relied on matching questions against hand-crafted rules. However, rules require laborious effort to create and often suffer from being too specific. Statistical question classification methods overcome these issues by employing machine learning techniques. We empirically show that a statistical approach is robust and achieves good performance on three diverse data sets with little or no hand tuning. Furthermore, we examine the role different syntactic and semantic features have on performance. We find that semantic features tend to increase performance more than purely syntactic features. Finally, we analyze common causes of misclassification error and provide insight into ways they may be overcome.</description><subject>Artificial intelligence</subject><subject>Classification</subject><subject>Error analysis</subject><subject>Exact sciences and technology</subject><subject>Information and communication sciences</subject><subject>Information processing and retrieval</subject><subject>Information retrieval</subject><subject>Information retrieval. Man machine relationship</subject><subject>Information science. Documentation</subject><subject>Machine learning</subject><subject>Questions</subject><subject>Research process. Evaluation</subject><subject>Sciences and techniques of general use</subject><subject>Semantics</subject><subject>Statistical process control</subject><subject>Studies</subject><issn>1386-4564</issn><issn>1573-7659</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><sourceid>M0C</sourceid><recordid>eNp1kE1LAzEQhoMoWD9-gLdF0Vt0sslkN8dS1AoFL3oO2TSBLdtuzWwP_femHygInjIJz7wveRi7EfAoAKonElAZwQGQa2OQyxM2ElhJXmk0p3mWteYKtTpnF0QLANBKmRGbjleu21JLRR8LGtzQ0tB61xVfm5CnflX4zhG1MT_ur7FPRXR-4I2jMP_B6IqdRddRuD6el-zz5fljMuWz99e3yXjGvQI18ChkVQIarYNp5rpUqqkBlcHYGG-ClGUlQUZXopASEZ2ITkFoQqwhBI3ykj0cctep35fbZUs-dJ1bhX5DViKAEbXK4N0fcNFvUv4t2dJAbbDGSmfq9l9KgpZCql2nOEA-9UQpRLtO7dKlrRVgd_rtQb_N-u1Ov5V55_4Y7Cj7jMmtfEu_izr3lzn7G506g3M</recordid><startdate>200501</startdate><enddate>200501</enddate><creator>METZLER, Donald</creator><creator>CROFT, W. Bruce</creator><general>Springer</general><general>Springer Nature B.V</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>88I</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M2P</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PYYUZ</scope><scope>Q9U</scope></search><sort><creationdate>200501</creationdate><title>Analysis of statistical question classification for fact-based questions</title><author>METZLER, Donald ; CROFT, W. Bruce</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c404t-f137205966e9bd6244b805495fb9c9e3327303fa25133555a1fa40ebef80ee653</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Artificial intelligence</topic><topic>Classification</topic><topic>Error analysis</topic><topic>Exact sciences and technology</topic><topic>Information and communication sciences</topic><topic>Information processing and retrieval</topic><topic>Information retrieval</topic><topic>Information retrieval. Man machine relationship</topic><topic>Information science. Documentation</topic><topic>Machine learning</topic><topic>Questions</topic><topic>Research process. Evaluation</topic><topic>Sciences and techniques of general use</topic><topic>Semantics</topic><topic>Statistical process control</topic><topic>Studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>METZLER, Donald</creatorcontrib><creatorcontrib>CROFT, W. Bruce</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI-INFORM Complete</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection</collection><collection>Science Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>ProQuest Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Collection</collection><collection>Computing Database</collection><collection>Science Database (ProQuest)</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>One Business (ProQuest)</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ABI/INFORM Collection China</collection><collection>ProQuest Central Basic</collection><jtitle>Information retrieval (Boston)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>METZLER, Donald</au><au>CROFT, W. Bruce</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Analysis of statistical question classification for fact-based questions</atitle><jtitle>Information retrieval (Boston)</jtitle><date>2005-01</date><risdate>2005</risdate><volume>8</volume><issue>3</issue><spage>481</spage><epage>504</epage><pages>481-504</pages><issn>1386-4564</issn><eissn>1573-7659</eissn><abstract>Question classification systems play an important role in question answering systems and can be used in a wide range of other domains. The goal of question classification is to accurately assign labels to questions based on expected answer type. Most approaches in the past have relied on matching questions against hand-crafted rules. However, rules require laborious effort to create and often suffer from being too specific. Statistical question classification methods overcome these issues by employing machine learning techniques. We empirically show that a statistical approach is robust and achieves good performance on three diverse data sets with little or no hand tuning. Furthermore, we examine the role different syntactic and semantic features have on performance. We find that semantic features tend to increase performance more than purely syntactic features. Finally, we analyze common causes of misclassification error and provide insight into ways they may be overcome.</abstract><cop>Dordrecht</cop><pub>Springer</pub><doi>10.1007/s10791-005-6995-3</doi><tpages>24</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1386-4564 |
ispartof | Information retrieval (Boston), 2005-01, Vol.8 (3), p.481-504 |
issn | 1386-4564 1573-7659 |
language | eng |
recordid | cdi_proquest_miscellaneous_35009184 |
source | ABI/INFORM Collection; Springer Link |
subjects | Artificial intelligence Classification Error analysis Exact sciences and technology Information and communication sciences Information processing and retrieval Information retrieval Information retrieval. Man machine relationship Information science. Documentation Machine learning Questions Research process. Evaluation Sciences and techniques of general use Semantics Statistical process control Studies |
title | Analysis of statistical question classification for fact-based questions |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T02%3A35%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Analysis%20of%20statistical%20question%20classification%20for%20fact-based%20questions&rft.jtitle=Information%20retrieval%20(Boston)&rft.au=METZLER,%20Donald&rft.date=2005-01&rft.volume=8&rft.issue=3&rft.spage=481&rft.epage=504&rft.pages=481-504&rft.issn=1386-4564&rft.eissn=1573-7659&rft_id=info:doi/10.1007/s10791-005-6995-3&rft_dat=%3Cproquest_cross%3E824380491%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c404t-f137205966e9bd6244b805495fb9c9e3327303fa25133555a1fa40ebef80ee653%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=230631345&rft_id=info:pmid/&rfr_iscdi=true |