Loading…

Hierarchies of indices for text searching

We present an efficient implementation of a recently known index for text databases, when the database is stored on secondary storage devices such as magnetic or optical disks. The implementation is built on top of a new and simple index for texts called pat array (or suffix array). Considering that...

Full description

Saved in:
Bibliographic Details
Published in:Information systems (Oxford) 1996-09, Vol.21 (6), p.497-514
Main Authors: Baeza-Yates, Ricardo, Barbosa, Eduardo F., Ziviani, Nivio
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We present an efficient implementation of a recently known index for text databases, when the database is stored on secondary storage devices such as magnetic or optical disks. The implementation is built on top of a new and simple index for texts called pat array (or suffix array). Considering that text searching in a large database spends most of the time accessing external storage devices, we propose additional index structures and searching algorithms for pat arrays that reduce the number of disk accesses. We present two index structures: a two-level hierarchy model that uses main memory and one level of external storage (magnetic or optical devices) and a three-level hierarchy model that uses main memory and two levels of external storage (magnetic and optical devices). Performance improvement is achieved in both models by storing most of higher index levels in faster memories, thus reducing accesses in the slowest devices in the hierarchy. Analytical and experimental results are presented for both models. For 160 megabytes of text stored on cd-rom disk the two-level model using 2 megabytes of main memory costs 20% of the pat array used as a single level.
ISSN:0306-4379
1873-6076
DOI:10.1016/0306-4379(96)00025-7