Loading…

Multifunctional cube index for information retrieval and clustering

The rapid blossom of the internet has rising volume of unstructured text data affiliated to multidimensional information. This situation requires an index that is capable of fast retrieval and analysis of the documents. This paper proposes a cube index which integrates the potential of both informat...

Full description

Saved in:
Bibliographic Details
Main Authors: Karthika, N., Janet, B., Kumar, Rohit
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The rapid blossom of the internet has rising volume of unstructured text data affiliated to multidimensional information. This situation requires an index that is capable of fast retrieval and analysis of the documents. This paper proposes a cube index which integrates the potential of both information retrieval and analysis over the unstructured text data. The cube index model saves the information as a cube for data processing or mining. This cube index is constructed with the help of HDF5 format which has internal compression as its one of its powerful features. In this model, the direct index, wordpair index and next word index are unified together to make a cube index that is multidimensional in nature. The materialization of the cube index is accomplished by experimenting on a well-known FIRE English 2011 dataset with a storage size of 11.82% lesser than the inverted wordpair index. In the proposed cube index, the recall has an increase in the percentage of 54.1% with a wordpair index anda 46.1% increase with an inverted index. The resultant set of retrieved documents is clustered using the cosine similarity. The results show promising assurance of the proposed model in an unstructured multidimensional text data.
ISSN:0094-243X
1551-7616
DOI:10.1063/5.0168252