Loading…

Corpus-based Learning of Analogies and Semantic Relations

We present an algorithm for learning from unlabeled text, based on the Vector Space Model (VSM) of information retrieval, that can solve verbal analogy questions of the kind found in the SAT college entrance exam. A verbal analogy has the form A:B::C:D, meaning "A is to B as C is to D"; fo...

Full description

Saved in:

Bibliographic Details
Published in:	Machine learning 2005-09, Vol.60 (1-3), p.251-278
Main Authors:	Turney, Peter D., Littman, Michael L.
Format:	Article
Language:	English
Subjects:	Information retrieval Studies
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c381t-43cc5de8a9f33330fab6ad28c48f88e77ab2be296fb23a282b7490bd6101046d3
cites	cdi_FETCH-LOGICAL-c381t-43cc5de8a9f33330fab6ad28c48f88e77ab2be296fb23a282b7490bd6101046d3
container_end_page	278
container_issue	1-3
container_start_page	251
container_title	Machine learning
container_volume	60
creator	Turney, Peter D. Littman, Michael L.
description	We present an algorithm for learning from unlabeled text, based on the Vector Space Model (VSM) of information retrieval, that can solve verbal analogy questions of the kind found in the SAT college entrance exam. A verbal analogy has the form A:B::C:D, meaning "A is to B as C is to D"; for example, mason:stone::carpenter:wood. SAT analogy questions provide a word pair, A:B, and the problem is to select the most analogous word pair, C:D, from a set of five choices. The VSM algorithm correctly answers 47% of a collection of 374 college-level analogy questions (random guessing would yield 20% correct; the average college-bound senior high school student answers about 57% correctly). We motivate this research by applying it to a difficult problem in natural language processing, determining semantic relations in noun-modifier pairs. The problem is to classify a noun-modifier pair, such as "laser printer", according to the semantic relation between the noun (printer) and the modifier (laser). We use a supervised nearest-neighbour algorithm that assigns a class to a given noun-modifier pair by finding the most analogous noun-modifier pair in the training data. With 30 classes of semantic relations, on a collection of 600 labeled noun-modifier pairs, the learning algorithm attains an F value of 26.5% (random guessing: 3.3%). With 5 classes of semantic relations, the F value is 43.2% (random: 20%). The performance is state-of-the-art for both verbal analogies and noun-modifier relations.[PUBLICATION ABSTRACT]
doi_str_mv	10.1007/s10994-005-0913-1
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_757010704</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2157421371</sourcerecordid><originalsourceid>FETCH-LOGICAL-c381t-43cc5de8a9f33330fab6ad28c48f88e77ab2be296fb23a282b7490bd6101046d3</originalsourceid><addsrcrecordid>eNotkM1KAzEYRYMoWKsP4G5wH_2-ZPK3LEWtUBD8WYdkJilTppOaTBe-vVPGu7mbw-VyCLlHeEQA9VQQjKkpgKBgkFO8IAsUilMQUlySBWgtqEQmrslNKXsAYFLLBTHrlI-nQr0roa22weWhG3ZVitVqcH3adaFUbmirz3Bww9g11Ufo3dilodySq-j6Eu7-e0m-X56_1hu6fX99W6-2tOEaR1rzphFt0M5EPgWi89K1TDe1jloHpZxnPjAjo2fcMc28qg34ViIg1LLlS_Iw7x5z-jmFMtp9OuXpXLFKqAlSUE8QzlCTUyk5RHvM3cHlX4tgz4LsLMhOguxZkEX-B7aaV5k</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>757010704</pqid></control><display><type>article</type><title>Corpus-based Learning of Analogies and Semantic Relations</title><source>Springer Link</source><creator>Turney, Peter D. ; Littman, Michael L.</creator><creatorcontrib>Turney, Peter D. ; Littman, Michael L.</creatorcontrib><description>We present an algorithm for learning from unlabeled text, based on the Vector Space Model (VSM) of information retrieval, that can solve verbal analogy questions of the kind found in the SAT college entrance exam. A verbal analogy has the form A:B::C:D, meaning "A is to B as C is to D"; for example, mason:stone::carpenter:wood. SAT analogy questions provide a word pair, A:B, and the problem is to select the most analogous word pair, C:D, from a set of five choices. The VSM algorithm correctly answers 47% of a collection of 374 college-level analogy questions (random guessing would yield 20% correct; the average college-bound senior high school student answers about 57% correctly). We motivate this research by applying it to a difficult problem in natural language processing, determining semantic relations in noun-modifier pairs. The problem is to classify a noun-modifier pair, such as "laser printer", according to the semantic relation between the noun (printer) and the modifier (laser). We use a supervised nearest-neighbour algorithm that assigns a class to a given noun-modifier pair by finding the most analogous noun-modifier pair in the training data. With 30 classes of semantic relations, on a collection of 600 labeled noun-modifier pairs, the learning algorithm attains an F value of 26.5% (random guessing: 3.3%). With 5 classes of semantic relations, the F value is 43.2% (random: 20%). The performance is state-of-the-art for both verbal analogies and noun-modifier relations.[PUBLICATION ABSTRACT]</description><identifier>ISSN: 0885-6125</identifier><identifier>EISSN: 1573-0565</identifier><identifier>DOI: 10.1007/s10994-005-0913-1</identifier><language>eng</language><publisher>Dordrecht: Springer Nature B.V</publisher><subject>Information retrieval ; Studies</subject><ispartof>Machine learning, 2005-09, Vol.60 (1-3), p.251-278</ispartof><rights>Springer Science + Business Media, Inc. 2005</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c381t-43cc5de8a9f33330fab6ad28c48f88e77ab2be296fb23a282b7490bd6101046d3</citedby><cites>FETCH-LOGICAL-c381t-43cc5de8a9f33330fab6ad28c48f88e77ab2be296fb23a282b7490bd6101046d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27903,27904</link.rule.ids></links><search><creatorcontrib>Turney, Peter D.</creatorcontrib><creatorcontrib>Littman, Michael L.</creatorcontrib><title>Corpus-based Learning of Analogies and Semantic Relations</title><title>Machine learning</title><description>We present an algorithm for learning from unlabeled text, based on the Vector Space Model (VSM) of information retrieval, that can solve verbal analogy questions of the kind found in the SAT college entrance exam. A verbal analogy has the form A:B::C:D, meaning "A is to B as C is to D"; for example, mason:stone::carpenter:wood. SAT analogy questions provide a word pair, A:B, and the problem is to select the most analogous word pair, C:D, from a set of five choices. The VSM algorithm correctly answers 47% of a collection of 374 college-level analogy questions (random guessing would yield 20% correct; the average college-bound senior high school student answers about 57% correctly). We motivate this research by applying it to a difficult problem in natural language processing, determining semantic relations in noun-modifier pairs. The problem is to classify a noun-modifier pair, such as "laser printer", according to the semantic relation between the noun (printer) and the modifier (laser). We use a supervised nearest-neighbour algorithm that assigns a class to a given noun-modifier pair by finding the most analogous noun-modifier pair in the training data. With 30 classes of semantic relations, on a collection of 600 labeled noun-modifier pairs, the learning algorithm attains an F value of 26.5% (random guessing: 3.3%). With 5 classes of semantic relations, the F value is 43.2% (random: 20%). The performance is state-of-the-art for both verbal analogies and noun-modifier relations.[PUBLICATION ABSTRACT]</description><subject>Information retrieval</subject><subject>Studies</subject><issn>0885-6125</issn><issn>1573-0565</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><recordid>eNotkM1KAzEYRYMoWKsP4G5wH_2-ZPK3LEWtUBD8WYdkJilTppOaTBe-vVPGu7mbw-VyCLlHeEQA9VQQjKkpgKBgkFO8IAsUilMQUlySBWgtqEQmrslNKXsAYFLLBTHrlI-nQr0roa22weWhG3ZVitVqcH3adaFUbmirz3Bww9g11Ufo3dilodySq-j6Eu7-e0m-X56_1hu6fX99W6-2tOEaR1rzphFt0M5EPgWi89K1TDe1jloHpZxnPjAjo2fcMc28qg34ViIg1LLlS_Iw7x5z-jmFMtp9OuXpXLFKqAlSUE8QzlCTUyk5RHvM3cHlX4tgz4LsLMhOguxZkEX-B7aaV5k</recordid><startdate>20050901</startdate><enddate>20050901</enddate><creator>Turney, Peter D.</creator><creator>Littman, Michael L.</creator><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7XB</scope><scope>88I</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M2P</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope></search><sort><creationdate>20050901</creationdate><title>Corpus-based Learning of Analogies and Semantic Relations</title><author>Turney, Peter D. ; Littman, Michael L.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c381t-43cc5de8a9f33330fab6ad28c48f88e77ab2be296fb23a282b7490bd6101046d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Information retrieval</topic><topic>Studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Turney, Peter D.</creatorcontrib><creatorcontrib>Littman, Michael L.</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Science Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Database‎ (1962 - current)</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>ProQuest Science Journals</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Machine learning</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Turney, Peter D.</au><au>Littman, Michael L.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Corpus-based Learning of Analogies and Semantic Relations</atitle><jtitle>Machine learning</jtitle><date>2005-09-01</date><risdate>2005</risdate><volume>60</volume><issue>1-3</issue><spage>251</spage><epage>278</epage><pages>251-278</pages><issn>0885-6125</issn><eissn>1573-0565</eissn><abstract>We present an algorithm for learning from unlabeled text, based on the Vector Space Model (VSM) of information retrieval, that can solve verbal analogy questions of the kind found in the SAT college entrance exam. A verbal analogy has the form A:B::C:D, meaning "A is to B as C is to D"; for example, mason:stone::carpenter:wood. SAT analogy questions provide a word pair, A:B, and the problem is to select the most analogous word pair, C:D, from a set of five choices. The VSM algorithm correctly answers 47% of a collection of 374 college-level analogy questions (random guessing would yield 20% correct; the average college-bound senior high school student answers about 57% correctly). We motivate this research by applying it to a difficult problem in natural language processing, determining semantic relations in noun-modifier pairs. The problem is to classify a noun-modifier pair, such as "laser printer", according to the semantic relation between the noun (printer) and the modifier (laser). We use a supervised nearest-neighbour algorithm that assigns a class to a given noun-modifier pair by finding the most analogous noun-modifier pair in the training data. With 30 classes of semantic relations, on a collection of 600 labeled noun-modifier pairs, the learning algorithm attains an F value of 26.5% (random guessing: 3.3%). With 5 classes of semantic relations, the F value is 43.2% (random: 20%). The performance is state-of-the-art for both verbal analogies and noun-modifier relations.[PUBLICATION ABSTRACT]</abstract><cop>Dordrecht</cop><pub>Springer Nature B.V</pub><doi>10.1007/s10994-005-0913-1</doi><tpages>28</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0885-6125
ispartof	Machine learning, 2005-09, Vol.60 (1-3), p.251-278
issn	0885-6125 1573-0565
language	eng
recordid	cdi_proquest_journals_757010704
source	Springer Link
subjects	Information retrieval Studies
title	Corpus-based Learning of Analogies and Semantic Relations
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T15%3A54%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Corpus-based%20Learning%20of%20Analogies%20and%20Semantic%20Relations&rft.jtitle=Machine%20learning&rft.au=Turney,%20Peter%20D.&rft.date=2005-09-01&rft.volume=60&rft.issue=1-3&rft.spage=251&rft.epage=278&rft.pages=251-278&rft.issn=0885-6125&rft.eissn=1573-0565&rft_id=info:doi/10.1007/s10994-005-0913-1&rft_dat=%3Cproquest_cross%3E2157421371%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c381t-43cc5de8a9f33330fab6ad28c48f88e77ab2be296fb23a282b7490bd6101046d3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=757010704&rft_id=info:pmid/&rfr_iscdi=true