Loading…

Big Data Full-Text Search Index Minimization Using Text Summarization

An efficient full-text search is achieved by indexing the raw data with an additional 20 to 30 percent storagecost. In the context of Big Data, this additional storage space is huge and introduces challenges to entertainfull-text search queries with good performance. It also incurs overhead to store...

Full description

Saved in:
Bibliographic Details
Published in:Information technology and control 2021-06, Vol.50 (2), p.375-389
Main Authors: Iqbal, Waheed, Ilyas Malik, Waqas, Bukhari, Faisal, Almustafa, Khaled Mohamad, Nawaz, Zubiar
Format: Article
Language:English
Citations: Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:An efficient full-text search is achieved by indexing the raw data with an additional 20 to 30 percent storagecost. In the context of Big Data, this additional storage space is huge and introduces challenges to entertainfull-text search queries with good performance. It also incurs overhead to store, manage, and update the largesize index. In this paper, we propose and evaluate a method to minimize the index size to offer full-text searchover Big Data using an automatic extractive-based text summarization method. To evaluate the effectivenessof the proposed approach, we used two real-world datasets. We indexed actual and summarized datasets usingApache Lucene and studied average simple overlapping, Spearman’s rho correlation, and average rankingscore measures of search results obtained using different search queries. Our experimental evaluation showsthat automatic text summarization is an effective method to reduce the index size significantly. We obtained amaximum of 82% reduction in index size with 42% higher relevance of the search results using the proposedsolution to minimize the full-text index size.
ISSN:1392-124X
2335-884X
DOI:10.5755/j01.itc.50.2.25470