Loading…

BEVA: An Efficient Query Processing Algorithm for Error-Tolerant Autocompletion

Query autocompletion has become a standard feature in many search applications, especially for search engines. A recent trend is to support the error-tolerant autocompletion , which increases the usability significantly by matching prefixes of database strings and allowing a small number of errors....

Full description

Saved in:
Bibliographic Details
Published in:ACM transactions on database systems 2016-04, Vol.41 (1), p.1-44
Main Authors: Zhou, Xiaoling, Qin, Jianbin, Xiao, Chuan, Wang, Wei, Lin, Xuemin, Ishikawa, Yoshiharu
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c220t-964f12eb452656c0fb782d4310ba2375b2f70d25919115ed1266ff5305b786e83
container_end_page 44
container_issue 1
container_start_page 1
container_title ACM transactions on database systems
container_volume 41
creator Zhou, Xiaoling
Qin, Jianbin
Xiao, Chuan
Wang, Wei
Lin, Xuemin
Ishikawa, Yoshiharu
description Query autocompletion has become a standard feature in many search applications, especially for search engines. A recent trend is to support the error-tolerant autocompletion , which increases the usability significantly by matching prefixes of database strings and allowing a small number of errors. In this article, we systematically study the query processing problem for error-tolerant autocompletion with a given edit distance threshold. We propose a general framework that encompasses existing methods and characterizes different classes of algorithms and the minimum amount of information they need to maintain under different constraints. We then propose a novel evaluation strategy that achieves the minimum active node size by eliminating ancestor-descendant relationships among active nodes entirely. In addition, we characterize the essence of edit distance computation by a novel data structure named edit vector automaton (EVA). It enables us to compute new active nodes and their associated states efficiently by table lookups. In order to support large distance thresholds, we devise a partitioning scheme to reduce the size and construction cost of the automaton, which results in the universal partitioned EVA (UPEVA) to handle arbitrarily large thresholds. Our extensive evaluation demonstrates that our proposed method outperforms existing approaches in both space and time efficiencies.
doi_str_mv 10.1145/2877201
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1808054133</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1808054133</sourcerecordid><originalsourceid>FETCH-LOGICAL-c220t-964f12eb452656c0fb782d4310ba2375b2f70d25919115ed1266ff5305b786e83</originalsourceid><addsrcrecordid>eNotj0tLAzEURi9SwbGKP0M30XtvcpPMspb6gIIbdRvmkUBl6tSkXfjvHWlX3-bwHQ7ADeE9kZEH9s4x0hlUJOKUscbMoEJtWUlNcgGXpXwhovG1q2D2uPpcXMF5aoYSr087h4-n1fvyRa3fnl-Xi7XqmHGvamsScWyNsBXbYWqd595owrZh7aTl5LDnyVITSeyJrU1JNMoE2uj1HO6Ov7s8_hxi2YftpnRxGJrvOB5KII8exZDWE3p7RLs8lpJjCru82Tb5NxCG_85w6tR_4YY_aA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1808054133</pqid></control><display><type>article</type><title>BEVA: An Efficient Query Processing Algorithm for Error-Tolerant Autocompletion</title><source>Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)</source><source>BSC - Ebsco (Business Source Ultimate)</source><creator>Zhou, Xiaoling ; Qin, Jianbin ; Xiao, Chuan ; Wang, Wei ; Lin, Xuemin ; Ishikawa, Yoshiharu</creator><creatorcontrib>Zhou, Xiaoling ; Qin, Jianbin ; Xiao, Chuan ; Wang, Wei ; Lin, Xuemin ; Ishikawa, Yoshiharu</creatorcontrib><description>Query autocompletion has become a standard feature in many search applications, especially for search engines. A recent trend is to support the error-tolerant autocompletion , which increases the usability significantly by matching prefixes of database strings and allowing a small number of errors. In this article, we systematically study the query processing problem for error-tolerant autocompletion with a given edit distance threshold. We propose a general framework that encompasses existing methods and characterizes different classes of algorithms and the minimum amount of information they need to maintain under different constraints. We then propose a novel evaluation strategy that achieves the minimum active node size by eliminating ancestor-descendant relationships among active nodes entirely. In addition, we characterize the essence of edit distance computation by a novel data structure named edit vector automaton (EVA). It enables us to compute new active nodes and their associated states efficiently by table lookups. In order to support large distance thresholds, we devise a partitioning scheme to reduce the size and construction cost of the automaton, which results in the universal partitioned EVA (UPEVA) to handle arbitrarily large thresholds. Our extensive evaluation demonstrates that our proposed method outperforms existing approaches in both space and time efficiencies.</description><identifier>ISSN: 0362-5915</identifier><identifier>EISSN: 1557-4644</identifier><identifier>DOI: 10.1145/2877201</identifier><language>eng</language><subject>Algorithms ; Data structures ; Partitioning ; Query processing ; Search engines ; Searching ; Thresholds</subject><ispartof>ACM transactions on database systems, 2016-04, Vol.41 (1), p.1-44</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c220t-964f12eb452656c0fb782d4310ba2375b2f70d25919115ed1266ff5305b786e83</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Zhou, Xiaoling</creatorcontrib><creatorcontrib>Qin, Jianbin</creatorcontrib><creatorcontrib>Xiao, Chuan</creatorcontrib><creatorcontrib>Wang, Wei</creatorcontrib><creatorcontrib>Lin, Xuemin</creatorcontrib><creatorcontrib>Ishikawa, Yoshiharu</creatorcontrib><title>BEVA: An Efficient Query Processing Algorithm for Error-Tolerant Autocompletion</title><title>ACM transactions on database systems</title><description>Query autocompletion has become a standard feature in many search applications, especially for search engines. A recent trend is to support the error-tolerant autocompletion , which increases the usability significantly by matching prefixes of database strings and allowing a small number of errors. In this article, we systematically study the query processing problem for error-tolerant autocompletion with a given edit distance threshold. We propose a general framework that encompasses existing methods and characterizes different classes of algorithms and the minimum amount of information they need to maintain under different constraints. We then propose a novel evaluation strategy that achieves the minimum active node size by eliminating ancestor-descendant relationships among active nodes entirely. In addition, we characterize the essence of edit distance computation by a novel data structure named edit vector automaton (EVA). It enables us to compute new active nodes and their associated states efficiently by table lookups. In order to support large distance thresholds, we devise a partitioning scheme to reduce the size and construction cost of the automaton, which results in the universal partitioned EVA (UPEVA) to handle arbitrarily large thresholds. Our extensive evaluation demonstrates that our proposed method outperforms existing approaches in both space and time efficiencies.</description><subject>Algorithms</subject><subject>Data structures</subject><subject>Partitioning</subject><subject>Query processing</subject><subject>Search engines</subject><subject>Searching</subject><subject>Thresholds</subject><issn>0362-5915</issn><issn>1557-4644</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNotj0tLAzEURi9SwbGKP0M30XtvcpPMspb6gIIbdRvmkUBl6tSkXfjvHWlX3-bwHQ7ADeE9kZEH9s4x0hlUJOKUscbMoEJtWUlNcgGXpXwhovG1q2D2uPpcXMF5aoYSr087h4-n1fvyRa3fnl-Xi7XqmHGvamsScWyNsBXbYWqd595owrZh7aTl5LDnyVITSeyJrU1JNMoE2uj1HO6Ov7s8_hxi2YftpnRxGJrvOB5KII8exZDWE3p7RLs8lpJjCru82Tb5NxCG_85w6tR_4YY_aA</recordid><startdate>20160401</startdate><enddate>20160401</enddate><creator>Zhou, Xiaoling</creator><creator>Qin, Jianbin</creator><creator>Xiao, Chuan</creator><creator>Wang, Wei</creator><creator>Lin, Xuemin</creator><creator>Ishikawa, Yoshiharu</creator><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20160401</creationdate><title>BEVA</title><author>Zhou, Xiaoling ; Qin, Jianbin ; Xiao, Chuan ; Wang, Wei ; Lin, Xuemin ; Ishikawa, Yoshiharu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c220t-964f12eb452656c0fb782d4310ba2375b2f70d25919115ed1266ff5305b786e83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Algorithms</topic><topic>Data structures</topic><topic>Partitioning</topic><topic>Query processing</topic><topic>Search engines</topic><topic>Searching</topic><topic>Thresholds</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhou, Xiaoling</creatorcontrib><creatorcontrib>Qin, Jianbin</creatorcontrib><creatorcontrib>Xiao, Chuan</creatorcontrib><creatorcontrib>Wang, Wei</creatorcontrib><creatorcontrib>Lin, Xuemin</creatorcontrib><creatorcontrib>Ishikawa, Yoshiharu</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>ACM transactions on database systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhou, Xiaoling</au><au>Qin, Jianbin</au><au>Xiao, Chuan</au><au>Wang, Wei</au><au>Lin, Xuemin</au><au>Ishikawa, Yoshiharu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>BEVA: An Efficient Query Processing Algorithm for Error-Tolerant Autocompletion</atitle><jtitle>ACM transactions on database systems</jtitle><date>2016-04-01</date><risdate>2016</risdate><volume>41</volume><issue>1</issue><spage>1</spage><epage>44</epage><pages>1-44</pages><issn>0362-5915</issn><eissn>1557-4644</eissn><abstract>Query autocompletion has become a standard feature in many search applications, especially for search engines. A recent trend is to support the error-tolerant autocompletion , which increases the usability significantly by matching prefixes of database strings and allowing a small number of errors. In this article, we systematically study the query processing problem for error-tolerant autocompletion with a given edit distance threshold. We propose a general framework that encompasses existing methods and characterizes different classes of algorithms and the minimum amount of information they need to maintain under different constraints. We then propose a novel evaluation strategy that achieves the minimum active node size by eliminating ancestor-descendant relationships among active nodes entirely. In addition, we characterize the essence of edit distance computation by a novel data structure named edit vector automaton (EVA). It enables us to compute new active nodes and their associated states efficiently by table lookups. In order to support large distance thresholds, we devise a partitioning scheme to reduce the size and construction cost of the automaton, which results in the universal partitioned EVA (UPEVA) to handle arbitrarily large thresholds. Our extensive evaluation demonstrates that our proposed method outperforms existing approaches in both space and time efficiencies.</abstract><doi>10.1145/2877201</doi><tpages>44</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0362-5915
ispartof ACM transactions on database systems, 2016-04, Vol.41 (1), p.1-44
issn 0362-5915
1557-4644
language eng
recordid cdi_proquest_miscellaneous_1808054133
source Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list); BSC - Ebsco (Business Source Ultimate)
subjects Algorithms
Data structures
Partitioning
Query processing
Search engines
Searching
Thresholds
title BEVA: An Efficient Query Processing Algorithm for Error-Tolerant Autocompletion
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T23%3A00%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=BEVA:%20An%20Efficient%20Query%20Processing%20Algorithm%20for%20Error-Tolerant%20Autocompletion&rft.jtitle=ACM%20transactions%20on%20database%20systems&rft.au=Zhou,%20Xiaoling&rft.date=2016-04-01&rft.volume=41&rft.issue=1&rft.spage=1&rft.epage=44&rft.pages=1-44&rft.issn=0362-5915&rft.eissn=1557-4644&rft_id=info:doi/10.1145/2877201&rft_dat=%3Cproquest_cross%3E1808054133%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c220t-964f12eb452656c0fb782d4310ba2375b2f70d25919115ed1266ff5305b786e83%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1808054133&rft_id=info:pmid/&rfr_iscdi=true