Loading…
MwTExt: automatic extraction of multi-word terms to generate compound concepts within ontology
Multiword expressions are omnipresent element of natural language, whose construal as a linguistic resource has significant importance in various applications. This paper presents an architecture-MwTExt, for automatic extraction of multi-word terms-MWTs from such expressions within un-annotated Engl...
Saved in:
Published in: | International journal of information technology (Singapore. Online) 2018-09, Vol.10 (3), p.303-311 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Multiword expressions are omnipresent element of natural language, whose construal as a linguistic resource has significant importance in various applications. This paper presents an architecture-MwTExt, for automatic extraction of multi-word terms-MWTs from such expressions within un-annotated English documents. Natural Language Processing techniques such as Shallow parsing and syntactic structure analysis are used to extract MWTs, with specific focus on lexical patterns as (Noun Preposition Noun), (Noun Preposition Noun + Noun) and (Noun Preposition Noun Preposition Noun). The MWTs extracted can be further used to form compound concepts within Ontology. The lexical descriptions of MWTs are encoded in Web Ontology Language OWL/XML. MwTExt has been tested on Computer Science domain texts, and the results obtained are compared with those obtained by Text2Onto, an Ontology learning tool and term extractors such as TermRaider and TerMine. The result signifies that MwTExt performs better for extraction of accurate lexicalized MWTs with average precision of 97%. |
---|---|
ISSN: | 2511-2104 2511-2112 |
DOI: | 10.1007/s41870-018-0111-6 |