Loading…

Identification of Multiword Expressions in Technical Domains: Investigating Statistical and Alignment-Based Approaches

Multiword Expressions (MWEs) are one of the stumbling blocks for more precise Natural Language Processing (NLP) systems. The lack of coverage of MWEs in resources can impact negatively on the performance of tasks and applications, and can lead to loss of information or communication errors; especial...

Full description

Saved in:
Bibliographic Details
Main Authors: Villavicencio, Aline, de Medeiros Caseli, Helena, Machado, Andre
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 35
container_issue
container_start_page 27
container_title
container_volume
creator Villavicencio, Aline
de Medeiros Caseli, Helena
Machado, Andre
description Multiword Expressions (MWEs) are one of the stumbling blocks for more precise Natural Language Processing (NLP) systems. The lack of coverage of MWEs in resources can impact negatively on the performance of tasks and applications, and can lead to loss of information or communication errors; especially in technical domains where MWE are frequent. This paper investigates some approaches to the identification of MWEs in technical corpora based on: association measures, part-of-speech and lexical alignment information. We examine the influence of some factors on their performance such as sources of information for identification and evaluation. While the association measures emphasize recall, the alignment method focuses on precision.
doi_str_mv 10.1109/STIL.2009.33
format conference_proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5532435</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5532435</ieee_id><sourcerecordid>5532435</sourcerecordid><originalsourceid>FETCH-LOGICAL-i90t-a1b3345629165c64835263575dd12ad20973404e35841408cb55dc4d4c2efa673</originalsourceid><addsrcrecordid>eNotjL1OwzAURo0QEqh0Y2PxCyTYvtdOzFZKgUhFDM1eubHTWkqdKA4F3h7zMx0d6XwfITec5Zwzfbepq3UuGNM5wBmZ66JkhdISNEo8_3WOAlExVsIlmcfod0yoQqkS2BU5VdaFybe-MZPvA-1b-vreTf6jHy1dfQ6jS4M-ROoDrV1zCCns6GN_ND7Ee1qFk4uT36dx2NPNlJj0JzHB0kXn9-GY_rMHE13yYRh70xxcvCYXremim_9zRuqnVb18ydZvz9Vysc68ZlNm-A4ApRKaK9koLEEKBbKQ1nJhrGC6AGToQJbIkZXNTkrboMVGuNaoAmbk9u_WO-e2w-iPZvzaSgkCQcI3_nJdVw</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Identification of Multiword Expressions in Technical Domains: Investigating Statistical and Alignment-Based Approaches</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Villavicencio, Aline ; de Medeiros Caseli, Helena ; Machado, Andre</creator><creatorcontrib>Villavicencio, Aline ; de Medeiros Caseli, Helena ; Machado, Andre</creatorcontrib><description>Multiword Expressions (MWEs) are one of the stumbling blocks for more precise Natural Language Processing (NLP) systems. The lack of coverage of MWEs in resources can impact negatively on the performance of tasks and applications, and can lead to loss of information or communication errors; especially in technical domains where MWE are frequent. This paper investigates some approaches to the identification of MWEs in technical corpora based on: association measures, part-of-speech and lexical alignment information. We examine the influence of some factors on their performance such as sources of information for identification and evaluation. While the association measures emphasize recall, the alignment method focuses on precision.</description><identifier>ISBN: 9781424460083</identifier><identifier>ISBN: 1424460085</identifier><identifier>EISBN: 9780769539454</identifier><identifier>EISBN: 0769539459</identifier><identifier>EISBN: 9781424460090</identifier><identifier>EISBN: 1424460093</identifier><identifier>DOI: 10.1109/STIL.2009.33</identifier><language>eng</language><publisher>IEEE</publisher><subject>Application software ; Computer science ; Global warming ; Humans ; Informatics ; Information resources ; Lexical Acquisition ; Multiword Expressions ; Natural language processing ; Natural languages ; Performance loss ; Vocabulary</subject><ispartof>2009 Seventh Brazilian Symposium in Information and Human Language Technology, 2009, p.27-35</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5532435$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5532435$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Villavicencio, Aline</creatorcontrib><creatorcontrib>de Medeiros Caseli, Helena</creatorcontrib><creatorcontrib>Machado, Andre</creatorcontrib><title>Identification of Multiword Expressions in Technical Domains: Investigating Statistical and Alignment-Based Approaches</title><title>2009 Seventh Brazilian Symposium in Information and Human Language Technology</title><addtitle>STIL</addtitle><description>Multiword Expressions (MWEs) are one of the stumbling blocks for more precise Natural Language Processing (NLP) systems. The lack of coverage of MWEs in resources can impact negatively on the performance of tasks and applications, and can lead to loss of information or communication errors; especially in technical domains where MWE are frequent. This paper investigates some approaches to the identification of MWEs in technical corpora based on: association measures, part-of-speech and lexical alignment information. We examine the influence of some factors on their performance such as sources of information for identification and evaluation. While the association measures emphasize recall, the alignment method focuses on precision.</description><subject>Application software</subject><subject>Computer science</subject><subject>Global warming</subject><subject>Humans</subject><subject>Informatics</subject><subject>Information resources</subject><subject>Lexical Acquisition</subject><subject>Multiword Expressions</subject><subject>Natural language processing</subject><subject>Natural languages</subject><subject>Performance loss</subject><subject>Vocabulary</subject><isbn>9781424460083</isbn><isbn>1424460085</isbn><isbn>9780769539454</isbn><isbn>0769539459</isbn><isbn>9781424460090</isbn><isbn>1424460093</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2009</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotjL1OwzAURo0QEqh0Y2PxCyTYvtdOzFZKgUhFDM1eubHTWkqdKA4F3h7zMx0d6XwfITec5Zwzfbepq3UuGNM5wBmZ66JkhdISNEo8_3WOAlExVsIlmcfod0yoQqkS2BU5VdaFybe-MZPvA-1b-vreTf6jHy1dfQ6jS4M-ROoDrV1zCCns6GN_ND7Ee1qFk4uT36dx2NPNlJj0JzHB0kXn9-GY_rMHE13yYRh70xxcvCYXremim_9zRuqnVb18ydZvz9Vysc68ZlNm-A4ApRKaK9koLEEKBbKQ1nJhrGC6AGToQJbIkZXNTkrboMVGuNaoAmbk9u_WO-e2w-iPZvzaSgkCQcI3_nJdVw</recordid><startdate>200909</startdate><enddate>200909</enddate><creator>Villavicencio, Aline</creator><creator>de Medeiros Caseli, Helena</creator><creator>Machado, Andre</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>200909</creationdate><title>Identification of Multiword Expressions in Technical Domains: Investigating Statistical and Alignment-Based Approaches</title><author>Villavicencio, Aline ; de Medeiros Caseli, Helena ; Machado, Andre</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i90t-a1b3345629165c64835263575dd12ad20973404e35841408cb55dc4d4c2efa673</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Application software</topic><topic>Computer science</topic><topic>Global warming</topic><topic>Humans</topic><topic>Informatics</topic><topic>Information resources</topic><topic>Lexical Acquisition</topic><topic>Multiword Expressions</topic><topic>Natural language processing</topic><topic>Natural languages</topic><topic>Performance loss</topic><topic>Vocabulary</topic><toplevel>online_resources</toplevel><creatorcontrib>Villavicencio, Aline</creatorcontrib><creatorcontrib>de Medeiros Caseli, Helena</creatorcontrib><creatorcontrib>Machado, Andre</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Villavicencio, Aline</au><au>de Medeiros Caseli, Helena</au><au>Machado, Andre</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Identification of Multiword Expressions in Technical Domains: Investigating Statistical and Alignment-Based Approaches</atitle><btitle>2009 Seventh Brazilian Symposium in Information and Human Language Technology</btitle><stitle>STIL</stitle><date>2009-09</date><risdate>2009</risdate><spage>27</spage><epage>35</epage><pages>27-35</pages><isbn>9781424460083</isbn><isbn>1424460085</isbn><eisbn>9780769539454</eisbn><eisbn>0769539459</eisbn><eisbn>9781424460090</eisbn><eisbn>1424460093</eisbn><abstract>Multiword Expressions (MWEs) are one of the stumbling blocks for more precise Natural Language Processing (NLP) systems. The lack of coverage of MWEs in resources can impact negatively on the performance of tasks and applications, and can lead to loss of information or communication errors; especially in technical domains where MWE are frequent. This paper investigates some approaches to the identification of MWEs in technical corpora based on: association measures, part-of-speech and lexical alignment information. We examine the influence of some factors on their performance such as sources of information for identification and evaluation. While the association measures emphasize recall, the alignment method focuses on precision.</abstract><pub>IEEE</pub><doi>10.1109/STIL.2009.33</doi><tpages>9</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISBN: 9781424460083
ispartof 2009 Seventh Brazilian Symposium in Information and Human Language Technology, 2009, p.27-35
issn
language eng
recordid cdi_ieee_primary_5532435
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Application software
Computer science
Global warming
Humans
Informatics
Information resources
Lexical Acquisition
Multiword Expressions
Natural language processing
Natural languages
Performance loss
Vocabulary
title Identification of Multiword Expressions in Technical Domains: Investigating Statistical and Alignment-Based Approaches
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T00%3A00%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Identification%20of%20Multiword%20Expressions%20in%20Technical%20Domains:%20Investigating%20Statistical%20and%20Alignment-Based%20Approaches&rft.btitle=2009%20Seventh%20Brazilian%20Symposium%20in%20Information%20and%20Human%20Language%20Technology&rft.au=Villavicencio,%20Aline&rft.date=2009-09&rft.spage=27&rft.epage=35&rft.pages=27-35&rft.isbn=9781424460083&rft.isbn_list=1424460085&rft_id=info:doi/10.1109/STIL.2009.33&rft.eisbn=9780769539454&rft.eisbn_list=0769539459&rft.eisbn_list=9781424460090&rft.eisbn_list=1424460093&rft_dat=%3Cieee_6IE%3E5532435%3C/ieee_6IE%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i90t-a1b3345629165c64835263575dd12ad20973404e35841408cb55dc4d4c2efa673%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5532435&rfr_iscdi=true