Loading…
Exploring Semantic Relatedness in Arabic Corpora using Paradigmatic and Syntagmatic Models
In this paper we explore two paradigms: firstly, paradigmatic representation via the native HAL model including a model enriched by adding word order information using the permutation technique of Sahlgren and al [21], and secondly the syntagmatic representation via a words-by-documents model constr...
Saved in:
Published in: | International journal of information engineering and electronic business 2016-01, Vol.8 (1), p.37-47 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In this paper we explore two paradigms: firstly, paradigmatic representation via the native HAL model including a model enriched by adding word order information using the permutation technique of Sahlgren and al [21], and secondly the syntagmatic representation via a words-by-documents model constructed using the Random Indexing method. We demonstrate that these kinds of word space models which were initially dedicated to extract similarity can also been efficient for extracting relatedness from Arabic corpora. For a given word the proposed models search the related words to it. A result is qualified as a failure when the number of related words given by a model is less than or equal to 4, otherwise it is considered as a success. To decide if a word is related to other one, we get help from an expert of the economic domain and use a glossary1 of the domain. First we begin by a comparison between a native HAL model and term- document model. The simple HAL model records a better result with a success rate of 72.92%. In a second stage, we want to boost the HAL model results by adding word order information via the permutation technique of sahlgren and al [21]. The success rate of the enriched HAL model attempt 79.2 %. |
---|---|
ISSN: | 2074-9023 2074-9031 |
DOI: | 10.5815/ijieeb.2016.01.05 |