Loading…
BEVA: An Efficient Query Processing Algorithm for Error-Tolerant Autocompletion
Query autocompletion has become a standard feature in many search applications, especially for search engines. A recent trend is to support the error-tolerant autocompletion , which increases the usability significantly by matching prefixes of database strings and allowing a small number of errors....
Saved in:
Published in: | ACM transactions on database systems 2016-04, Vol.41 (1), p.1-44 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Query autocompletion has become a standard feature in many search applications, especially for search engines. A recent trend is to support the
error-tolerant autocompletion
, which increases the usability significantly by matching prefixes of database strings and allowing a small number of errors.
In this article, we systematically study the query processing problem for error-tolerant autocompletion with a given edit distance threshold. We propose a general framework that encompasses existing methods and characterizes different classes of algorithms and the minimum amount of information they need to maintain under different constraints. We then propose a novel evaluation strategy that achieves the minimum active node size by eliminating ancestor-descendant relationships among active nodes entirely. In addition, we characterize the essence of edit distance computation by a novel data structure named
edit vector automaton
(EVA). It enables us to compute new active nodes and their associated states efficiently by table lookups. In order to support large distance thresholds, we devise a partitioning scheme to reduce the size and construction cost of the automaton, which results in the
universal partitioned EVA
(UPEVA) to handle arbitrarily large thresholds. Our extensive evaluation demonstrates that our proposed method outperforms existing approaches in both space and time efficiencies. |
---|---|
ISSN: | 0362-5915 1557-4644 |
DOI: | 10.1145/2877201 |