Loading…

Identification of Multiword Expressions by Combining Multiple Linguistic Information Sources

We propose a framework for using multiple sources of linguistic information in the task of identifying multiword expressions in natural language texts. We define various linguistically motivated classification features and introduce novel ways for computing them. We then manually define interrelatio...

Full description

Saved in:
Bibliographic Details
Published in:Computational linguistics - Association for Computational Linguistics 2014-06, Vol.40 (2), p.449-468
Main Authors: Tsvetkov, Yulia, Wintner, Shuly
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c523t-e120809a787e9c8b353f0529c5b9f6d6874b5696855186bd475a0e5e15b444393
cites cdi_FETCH-LOGICAL-c523t-e120809a787e9c8b353f0529c5b9f6d6874b5696855186bd475a0e5e15b444393
container_end_page 468
container_issue 2
container_start_page 449
container_title Computational linguistics - Association for Computational Linguistics
container_volume 40
creator Tsvetkov, Yulia
Wintner, Shuly
description We propose a framework for using multiple sources of linguistic information in the task of identifying multiword expressions in natural language texts. We define various linguistically motivated classification features and introduce novel ways for computing them. We then manually define interrelationships among the features, and express them in a Bayesian network. The result is a powerful classifier that can identify multiword expressions of various types and multiple syntactic constructions in text corpora. Our methodology is unsupervised and language-independent; it requires relatively few language resources and is thus suitable for a large number of languages. We report results on English, French, and Hebrew, and demonstrate a significant improvement in identification accuracy, compared with less sophisticated baselines.
doi_str_mv 10.1162/COLI_a_00177
format article
fullrecord <record><control><sourceid>proquest_mit_j</sourceid><recordid>TN_cdi_proquest_miscellaneous_1558998797</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_3f0453f4268e43c78a83e573829e4b82</doaj_id><sourcerecordid>1559696477</sourcerecordid><originalsourceid>FETCH-LOGICAL-c523t-e120809a787e9c8b353f0529c5b9f6d6874b5696855186bd475a0e5e15b444393</originalsourceid><addsrcrecordid>eNqNkU1v1DAQQC0EEkvhxg-I1EsPhI6_YvtWtCol0lY9ADcky3GcyqskTu0EaH993QZVq6oHTpZnnt-MZxD6iOEzxhU53V7tam00ABbiFdpgTqFUFJPXaANS4ZLkxFv0LqU9AAigYoN-1a0bZ995a2YfxiJ0xeXSz_5PiG1x_neKLqUcT0VzW2zD0PjRj9crMvWu2OXb4tPsbVGPXYjDavkelmhdeo_edKZP7sO_8wj9_Hr-Y_ut3F1d1Nsvu9JyQufSYQISlBFSOGVlQzntgBNleaO6qq2kYA2vVCU5x7JqWia4Accd5g1jjCp6hOrV2waz11P0g4m3OhivHwMhXmsTc4-909nMsp6RSjpGrZBGUscFlUQ51kiSXSera4rhZnFp1oNP1vW9GV1Yksacq9wLE-J_UKmUFOoBPX6G7vOIxjyUTDEqlJQcMvVppWwMKUXXPf0Fg37YsD7ccMbPVnzwBz4bev-bgSeaAs7b1wQIzq81SH3np8f8geLkBcWL1e4BFw24nw</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1543798850</pqid></control><display><type>article</type><title>Identification of Multiword Expressions by Combining Multiple Linguistic Information Sources</title><source>MIT Press Direct OA Journals</source><source>EBSCOhost MLA International Bibliography With Full Text</source><source>Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)</source><source>Linguistics and Language Behavior Abstracts (LLBA)</source><creator>Tsvetkov, Yulia ; Wintner, Shuly</creator><creatorcontrib>Tsvetkov, Yulia ; Wintner, Shuly</creatorcontrib><description>We propose a framework for using multiple sources of linguistic information in the task of identifying multiword expressions in natural language texts. We define various linguistically motivated classification features and introduce novel ways for computing them. We then manually define interrelationships among the features, and express them in a Bayesian network. The result is a powerful classifier that can identify multiword expressions of various types and multiple syntactic constructions in text corpora. Our methodology is unsupervised and language-independent; it requires relatively few language resources and is thus suitable for a large number of languages. We report results on English, French, and Hebrew, and demonstrate a significant improvement in identification accuracy, compared with less sophisticated baselines.</description><identifier>ISSN: 0891-2017</identifier><identifier>EISSN: 1530-9312</identifier><identifier>DOI: 10.1162/COLI_a_00177</identifier><identifier>CODEN: CLINEE</identifier><language>eng</language><publisher>One Rogers Street, Cambridge, MA 02142-1209, USA: MIT Press</publisher><subject>Bayesian analysis ; Classification ; Classifiers ; Computation ; Computer science ; Linguistics ; Networks ; Tasks ; Texts</subject><ispartof>Computational linguistics - Association for Computational Linguistics, 2014-06, Vol.40 (2), p.449-468</ispartof><rights>Copyright MIT Press Journals Jun 2014</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c523t-e120809a787e9c8b353f0529c5b9f6d6874b5696855186bd475a0e5e15b444393</citedby><cites>FETCH-LOGICAL-c523t-e120809a787e9c8b353f0529c5b9f6d6874b5696855186bd475a0e5e15b444393</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://direct.mit.edu/coli/article/doi/10.1162/COLI_a_00177$$EHTML$$P50$$Gmit$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,27898,27899,31243,31244,64338</link.rule.ids></links><search><creatorcontrib>Tsvetkov, Yulia</creatorcontrib><creatorcontrib>Wintner, Shuly</creatorcontrib><title>Identification of Multiword Expressions by Combining Multiple Linguistic Information Sources</title><title>Computational linguistics - Association for Computational Linguistics</title><description>We propose a framework for using multiple sources of linguistic information in the task of identifying multiword expressions in natural language texts. We define various linguistically motivated classification features and introduce novel ways for computing them. We then manually define interrelationships among the features, and express them in a Bayesian network. The result is a powerful classifier that can identify multiword expressions of various types and multiple syntactic constructions in text corpora. Our methodology is unsupervised and language-independent; it requires relatively few language resources and is thus suitable for a large number of languages. We report results on English, French, and Hebrew, and demonstrate a significant improvement in identification accuracy, compared with less sophisticated baselines.</description><subject>Bayesian analysis</subject><subject>Classification</subject><subject>Classifiers</subject><subject>Computation</subject><subject>Computer science</subject><subject>Linguistics</subject><subject>Networks</subject><subject>Tasks</subject><subject>Texts</subject><issn>0891-2017</issn><issn>1530-9312</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><sourceid>7T9</sourceid><sourceid>DOA</sourceid><recordid>eNqNkU1v1DAQQC0EEkvhxg-I1EsPhI6_YvtWtCol0lY9ADcky3GcyqskTu0EaH993QZVq6oHTpZnnt-MZxD6iOEzxhU53V7tam00ABbiFdpgTqFUFJPXaANS4ZLkxFv0LqU9AAigYoN-1a0bZ995a2YfxiJ0xeXSz_5PiG1x_neKLqUcT0VzW2zD0PjRj9crMvWu2OXb4tPsbVGPXYjDavkelmhdeo_edKZP7sO_8wj9_Hr-Y_ut3F1d1Nsvu9JyQufSYQISlBFSOGVlQzntgBNleaO6qq2kYA2vVCU5x7JqWia4Accd5g1jjCp6hOrV2waz11P0g4m3OhivHwMhXmsTc4-909nMsp6RSjpGrZBGUscFlUQ51kiSXSera4rhZnFp1oNP1vW9GV1Yksacq9wLE-J_UKmUFOoBPX6G7vOIxjyUTDEqlJQcMvVppWwMKUXXPf0Fg37YsD7ccMbPVnzwBz4bev-bgSeaAs7b1wQIzq81SH3np8f8geLkBcWL1e4BFw24nw</recordid><startdate>20140601</startdate><enddate>20140601</enddate><creator>Tsvetkov, Yulia</creator><creator>Wintner, Shuly</creator><general>MIT Press</general><general>MIT Press Journals, The</general><general>The MIT Press</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7T9</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope></search><sort><creationdate>20140601</creationdate><title>Identification of Multiword Expressions by Combining Multiple Linguistic Information Sources</title><author>Tsvetkov, Yulia ; Wintner, Shuly</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c523t-e120809a787e9c8b353f0529c5b9f6d6874b5696855186bd475a0e5e15b444393</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Bayesian analysis</topic><topic>Classification</topic><topic>Classifiers</topic><topic>Computation</topic><topic>Computer science</topic><topic>Linguistics</topic><topic>Networks</topic><topic>Tasks</topic><topic>Texts</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Tsvetkov, Yulia</creatorcontrib><creatorcontrib>Wintner, Shuly</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>Computational linguistics - Association for Computational Linguistics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Tsvetkov, Yulia</au><au>Wintner, Shuly</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Identification of Multiword Expressions by Combining Multiple Linguistic Information Sources</atitle><jtitle>Computational linguistics - Association for Computational Linguistics</jtitle><date>2014-06-01</date><risdate>2014</risdate><volume>40</volume><issue>2</issue><spage>449</spage><epage>468</epage><pages>449-468</pages><issn>0891-2017</issn><eissn>1530-9312</eissn><coden>CLINEE</coden><abstract>We propose a framework for using multiple sources of linguistic information in the task of identifying multiword expressions in natural language texts. We define various linguistically motivated classification features and introduce novel ways for computing them. We then manually define interrelationships among the features, and express them in a Bayesian network. The result is a powerful classifier that can identify multiword expressions of various types and multiple syntactic constructions in text corpora. Our methodology is unsupervised and language-independent; it requires relatively few language resources and is thus suitable for a large number of languages. We report results on English, French, and Hebrew, and demonstrate a significant improvement in identification accuracy, compared with less sophisticated baselines.</abstract><cop>One Rogers Street, Cambridge, MA 02142-1209, USA</cop><pub>MIT Press</pub><doi>10.1162/COLI_a_00177</doi><tpages>20</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0891-2017
ispartof Computational linguistics - Association for Computational Linguistics, 2014-06, Vol.40 (2), p.449-468
issn 0891-2017
1530-9312
language eng
recordid cdi_proquest_miscellaneous_1558998797
source MIT Press Direct OA Journals; EBSCOhost MLA International Bibliography With Full Text; Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list); Linguistics and Language Behavior Abstracts (LLBA)
subjects Bayesian analysis
Classification
Classifiers
Computation
Computer science
Linguistics
Networks
Tasks
Texts
title Identification of Multiword Expressions by Combining Multiple Linguistic Information Sources
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-03-04T09%3A42%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_mit_j&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Identification%20of%20Multiword%20Expressions%20by%20Combining%20Multiple%20Linguistic%20Information%20Sources&rft.jtitle=Computational%20linguistics%20-%20Association%20for%20Computational%20Linguistics&rft.au=Tsvetkov,%20Yulia&rft.date=2014-06-01&rft.volume=40&rft.issue=2&rft.spage=449&rft.epage=468&rft.pages=449-468&rft.issn=0891-2017&rft.eissn=1530-9312&rft.coden=CLINEE&rft_id=info:doi/10.1162/COLI_a_00177&rft_dat=%3Cproquest_mit_j%3E1559696477%3C/proquest_mit_j%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c523t-e120809a787e9c8b353f0529c5b9f6d6874b5696855186bd475a0e5e15b444393%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1543798850&rft_id=info:pmid/&rfr_iscdi=true