Loading…

Identification of Multiword Expressions by Combining Multiple Linguistic Information Sources

We propose a framework for using multiple sources of linguistic information in the task of identifying multiword expressions in natural language texts. We define various linguistically motivated classification features and introduce novel ways for computing them. We then manually define interrelatio...

Full description

Saved in:
Bibliographic Details
Published in:Computational linguistics - Association for Computational Linguistics 2014-06, Vol.40 (2), p.449-468
Main Authors: Tsvetkov, Yulia, Wintner, Shuly
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We propose a framework for using multiple sources of linguistic information in the task of identifying multiword expressions in natural language texts. We define various linguistically motivated classification features and introduce novel ways for computing them. We then manually define interrelationships among the features, and express them in a Bayesian network. The result is a powerful classifier that can identify multiword expressions of various types and multiple syntactic constructions in text corpora. Our methodology is unsupervised and language-independent; it requires relatively few language resources and is thus suitable for a large number of languages. We report results on English, French, and Hebrew, and demonstrate a significant improvement in identification accuracy, compared with less sophisticated baselines.
ISSN:0891-2017
1530-9312
DOI:10.1162/COLI_a_00177