Loading…

Hidden data states-based complex terminology extraction from textual web data model

In order to respect the standards of the “semantic web” which allows the data to be shared and reused between several applications, it became necessary to model web text documents with a vision based on the concepts and exploit available linguistic resources. It’s evident that the extraction of sema...

Full description

Saved in:
Bibliographic Details
Published in:Applied intelligence (Dordrecht, Netherlands) Netherlands), 2020-06, Vol.50 (6), p.1813-1831
Main Authors: Fkih, Fethi, Omri, Mohamed Nazih
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In order to respect the standards of the “semantic web” which allows the data to be shared and reused between several applications, it became necessary to model web text documents with a vision based on the concepts and exploit available linguistic resources. It’s evident that the extraction of semantic tokens ensures semantic modelling of web documents. Unfortunately, terminology extraction techniques from unstructured Web text remain unable to provide powerful results. Indeed, systems developed based on the classical techniques extract massively high amounts of candidate terms and leave the task of separation between relevant and irrelevant candidates for post-processing. In this paper, we introduce HMM-Extract a novel model for terminology retrieval based on Markov model. Our model integrates two modules that work in cascade: a module based on Hidden Markov Model (HMM) for complex term extraction and a module based on Markov Chain for filtering terms provided by the HMM. Thus, we try to focus on three main contributions: firstly, we provide a linguistic and statistical specification of relevant terms. Secondly, we show the possibility of using a HMM to extract relevant terms from unstructured textual documents. Finally, we prove the importance of integrating statistical knowledge in a Markov Chain and we show, experimentally, its contribution to the field of terminology extraction.
ISSN:0924-669X
1573-7497
DOI:10.1007/s10489-019-01568-4