Loading…

Exploiting lexical patterns for knowledge graph construction from unstructured text in Spanish

Knowledge graphs (KGs) are useful data structures for the integration, retrieval, dissemination, and inference of information in various information domains. One of the main challenges in building KGs is the extraction of named entities (nodes) and their relations (edges), particularly when processi...

Full description

Saved in:
Bibliographic Details
Published in:Complex & intelligent systems 2023-04, Vol.9 (2), p.1281-1297
Main Authors: Rios-Alvarado, Ana B., Martinez-Rodriguez, Jose L., Garcia-Perez, Andrea G., Guerrero-Melendez, Tania Y., Lopez-Arevalo, Ivan, Gonzalez-Compean, Jose Luis
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Knowledge graphs (KGs) are useful data structures for the integration, retrieval, dissemination, and inference of information in various information domains. One of the main challenges in building KGs is the extraction of named entities (nodes) and their relations (edges), particularly when processing unstructured text as it has no semantic descriptions. Generating KGs from texts written in Spanish represents a research challenge as the existing structures, models, and strategies designed for other languages are not compatible in this scenario. This paper proposes a method to design and construct KGs from unstructured text in Spanish. We defined lexical patterns to extract named entities and (non) taxonomic, equivalence, and composition relations. Next, named entities are linked and enriched with DBpedia resources through a strategy based on SPARQL queries. Finally, OWL properties are defined from the predicate relations for creating resource description framework (RDF) triples. We evaluated the performance of the proposed method to determine the degree of elements extracted from the input text and to assess their quality through standard information retrieval measures. The evaluation revealed the feasibility of the proposed method to extract RDF triples from datasets in general and computer science domains. Competitive results were observed by comparing our method regarding an existing approach from the literature.
ISSN:2199-4536
2198-6053
DOI:10.1007/s40747-022-00805-7