Loading…
Identification of Multiword Expressions in Technical Domains: Investigating Statistical and Alignment-Based Approaches
Multiword Expressions (MWEs) are one of the stumbling blocks for more precise Natural Language Processing (NLP) systems. The lack of coverage of MWEs in resources can impact negatively on the performance of tasks and applications, and can lead to loss of information or communication errors; especial...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 35 |
container_issue | |
container_start_page | 27 |
container_title | |
container_volume | |
creator | Villavicencio, Aline de Medeiros Caseli, Helena Machado, Andre |
description | Multiword Expressions (MWEs) are one of the stumbling blocks for more precise Natural Language Processing (NLP) systems. The lack of coverage of MWEs in resources can impact negatively on the performance of tasks and applications, and can lead to loss of information or communication errors; especially in technical domains where MWE are frequent. This paper investigates some approaches to the identification of MWEs in technical corpora based on: association measures, part-of-speech and lexical alignment information. We examine the influence of some factors on their performance such as sources of information for identification and evaluation. While the association measures emphasize recall, the alignment method focuses on precision. |
doi_str_mv | 10.1109/STIL.2009.33 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5532435</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5532435</ieee_id><sourcerecordid>5532435</sourcerecordid><originalsourceid>FETCH-LOGICAL-i90t-a1b3345629165c64835263575dd12ad20973404e35841408cb55dc4d4c2efa673</originalsourceid><addsrcrecordid>eNotjL1OwzAURo0QEqh0Y2PxCyTYvtdOzFZKgUhFDM1eubHTWkqdKA4F3h7zMx0d6XwfITec5Zwzfbepq3UuGNM5wBmZ66JkhdISNEo8_3WOAlExVsIlmcfod0yoQqkS2BU5VdaFybe-MZPvA-1b-vreTf6jHy1dfQ6jS4M-ROoDrV1zCCns6GN_ND7Ee1qFk4uT36dx2NPNlJj0JzHB0kXn9-GY_rMHE13yYRh70xxcvCYXremim_9zRuqnVb18ydZvz9Vysc68ZlNm-A4ApRKaK9koLEEKBbKQ1nJhrGC6AGToQJbIkZXNTkrboMVGuNaoAmbk9u_WO-e2w-iPZvzaSgkCQcI3_nJdVw</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Identification of Multiword Expressions in Technical Domains: Investigating Statistical and Alignment-Based Approaches</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Villavicencio, Aline ; de Medeiros Caseli, Helena ; Machado, Andre</creator><creatorcontrib>Villavicencio, Aline ; de Medeiros Caseli, Helena ; Machado, Andre</creatorcontrib><description>Multiword Expressions (MWEs) are one of the stumbling blocks for more precise Natural Language Processing (NLP) systems. The lack of coverage of MWEs in resources can impact negatively on the performance of tasks and applications, and can lead to loss of information or communication errors; especially in technical domains where MWE are frequent. This paper investigates some approaches to the identification of MWEs in technical corpora based on: association measures, part-of-speech and lexical alignment information. We examine the influence of some factors on their performance such as sources of information for identification and evaluation. While the association measures emphasize recall, the alignment method focuses on precision.</description><identifier>ISBN: 9781424460083</identifier><identifier>ISBN: 1424460085</identifier><identifier>EISBN: 9780769539454</identifier><identifier>EISBN: 0769539459</identifier><identifier>EISBN: 9781424460090</identifier><identifier>EISBN: 1424460093</identifier><identifier>DOI: 10.1109/STIL.2009.33</identifier><language>eng</language><publisher>IEEE</publisher><subject>Application software ; Computer science ; Global warming ; Humans ; Informatics ; Information resources ; Lexical Acquisition ; Multiword Expressions ; Natural language processing ; Natural languages ; Performance loss ; Vocabulary</subject><ispartof>2009 Seventh Brazilian Symposium in Information and Human Language Technology, 2009, p.27-35</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5532435$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5532435$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Villavicencio, Aline</creatorcontrib><creatorcontrib>de Medeiros Caseli, Helena</creatorcontrib><creatorcontrib>Machado, Andre</creatorcontrib><title>Identification of Multiword Expressions in Technical Domains: Investigating Statistical and Alignment-Based Approaches</title><title>2009 Seventh Brazilian Symposium in Information and Human Language Technology</title><addtitle>STIL</addtitle><description>Multiword Expressions (MWEs) are one of the stumbling blocks for more precise Natural Language Processing (NLP) systems. The lack of coverage of MWEs in resources can impact negatively on the performance of tasks and applications, and can lead to loss of information or communication errors; especially in technical domains where MWE are frequent. This paper investigates some approaches to the identification of MWEs in technical corpora based on: association measures, part-of-speech and lexical alignment information. We examine the influence of some factors on their performance such as sources of information for identification and evaluation. While the association measures emphasize recall, the alignment method focuses on precision.</description><subject>Application software</subject><subject>Computer science</subject><subject>Global warming</subject><subject>Humans</subject><subject>Informatics</subject><subject>Information resources</subject><subject>Lexical Acquisition</subject><subject>Multiword Expressions</subject><subject>Natural language processing</subject><subject>Natural languages</subject><subject>Performance loss</subject><subject>Vocabulary</subject><isbn>9781424460083</isbn><isbn>1424460085</isbn><isbn>9780769539454</isbn><isbn>0769539459</isbn><isbn>9781424460090</isbn><isbn>1424460093</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2009</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotjL1OwzAURo0QEqh0Y2PxCyTYvtdOzFZKgUhFDM1eubHTWkqdKA4F3h7zMx0d6XwfITec5Zwzfbepq3UuGNM5wBmZ66JkhdISNEo8_3WOAlExVsIlmcfod0yoQqkS2BU5VdaFybe-MZPvA-1b-vreTf6jHy1dfQ6jS4M-ROoDrV1zCCns6GN_ND7Ee1qFk4uT36dx2NPNlJj0JzHB0kXn9-GY_rMHE13yYRh70xxcvCYXremim_9zRuqnVb18ydZvz9Vysc68ZlNm-A4ApRKaK9koLEEKBbKQ1nJhrGC6AGToQJbIkZXNTkrboMVGuNaoAmbk9u_WO-e2w-iPZvzaSgkCQcI3_nJdVw</recordid><startdate>200909</startdate><enddate>200909</enddate><creator>Villavicencio, Aline</creator><creator>de Medeiros Caseli, Helena</creator><creator>Machado, Andre</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>200909</creationdate><title>Identification of Multiword Expressions in Technical Domains: Investigating Statistical and Alignment-Based Approaches</title><author>Villavicencio, Aline ; de Medeiros Caseli, Helena ; Machado, Andre</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i90t-a1b3345629165c64835263575dd12ad20973404e35841408cb55dc4d4c2efa673</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Application software</topic><topic>Computer science</topic><topic>Global warming</topic><topic>Humans</topic><topic>Informatics</topic><topic>Information resources</topic><topic>Lexical Acquisition</topic><topic>Multiword Expressions</topic><topic>Natural language processing</topic><topic>Natural languages</topic><topic>Performance loss</topic><topic>Vocabulary</topic><toplevel>online_resources</toplevel><creatorcontrib>Villavicencio, Aline</creatorcontrib><creatorcontrib>de Medeiros Caseli, Helena</creatorcontrib><creatorcontrib>Machado, Andre</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Villavicencio, Aline</au><au>de Medeiros Caseli, Helena</au><au>Machado, Andre</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Identification of Multiword Expressions in Technical Domains: Investigating Statistical and Alignment-Based Approaches</atitle><btitle>2009 Seventh Brazilian Symposium in Information and Human Language Technology</btitle><stitle>STIL</stitle><date>2009-09</date><risdate>2009</risdate><spage>27</spage><epage>35</epage><pages>27-35</pages><isbn>9781424460083</isbn><isbn>1424460085</isbn><eisbn>9780769539454</eisbn><eisbn>0769539459</eisbn><eisbn>9781424460090</eisbn><eisbn>1424460093</eisbn><abstract>Multiword Expressions (MWEs) are one of the stumbling blocks for more precise Natural Language Processing (NLP) systems. The lack of coverage of MWEs in resources can impact negatively on the performance of tasks and applications, and can lead to loss of information or communication errors; especially in technical domains where MWE are frequent. This paper investigates some approaches to the identification of MWEs in technical corpora based on: association measures, part-of-speech and lexical alignment information. We examine the influence of some factors on their performance such as sources of information for identification and evaluation. While the association measures emphasize recall, the alignment method focuses on precision.</abstract><pub>IEEE</pub><doi>10.1109/STIL.2009.33</doi><tpages>9</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISBN: 9781424460083 |
ispartof | 2009 Seventh Brazilian Symposium in Information and Human Language Technology, 2009, p.27-35 |
issn | |
language | eng |
recordid | cdi_ieee_primary_5532435 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Application software Computer science Global warming Humans Informatics Information resources Lexical Acquisition Multiword Expressions Natural language processing Natural languages Performance loss Vocabulary |
title | Identification of Multiword Expressions in Technical Domains: Investigating Statistical and Alignment-Based Approaches |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T00%3A00%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Identification%20of%20Multiword%20Expressions%20in%20Technical%20Domains:%20Investigating%20Statistical%20and%20Alignment-Based%20Approaches&rft.btitle=2009%20Seventh%20Brazilian%20Symposium%20in%20Information%20and%20Human%20Language%20Technology&rft.au=Villavicencio,%20Aline&rft.date=2009-09&rft.spage=27&rft.epage=35&rft.pages=27-35&rft.isbn=9781424460083&rft.isbn_list=1424460085&rft_id=info:doi/10.1109/STIL.2009.33&rft.eisbn=9780769539454&rft.eisbn_list=0769539459&rft.eisbn_list=9781424460090&rft.eisbn_list=1424460093&rft_dat=%3Cieee_6IE%3E5532435%3C/ieee_6IE%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i90t-a1b3345629165c64835263575dd12ad20973404e35841408cb55dc4d4c2efa673%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5532435&rfr_iscdi=true |