Loading…
Hierarchies of indices for text searching
We present an efficient implementation of a recently known index for text databases, when the database is stored on secondary storage devices such as magnetic or optical disks. The implementation is built on top of a new and simple index for texts called pat array (or suffix array). Considering that...
Saved in:
Published in: | Information systems (Oxford) 1996-09, Vol.21 (6), p.497-514 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | We present an efficient implementation of a recently known index for text databases, when the database is stored on secondary storage devices such as magnetic or optical disks. The implementation is built on top of a new and simple index for texts called pat array (or suffix array). Considering that text searching in a large database spends most of the time accessing external storage devices, we propose additional index structures and searching algorithms for
pat arrays that reduce the number of disk accesses. We present two index structures: a two-level hierarchy model that uses main memory and one level of external storage (magnetic or optical devices) and a three-level hierarchy model that uses main memory and two levels of external storage (magnetic and optical devices). Performance improvement is achieved in both models by storing most of higher index levels in faster memories, thus reducing accesses in the slowest devices in the hierarchy. Analytical and experimental results are presented for both models. For 160 megabytes of text stored on
cd-rom disk the two-level model using 2 megabytes of main memory costs 20% of the
pat array used as a single level. |
---|---|
ISSN: | 0306-4379 1873-6076 |
DOI: | 10.1016/0306-4379(96)00025-7 |