Loading…

SK-LSH: an efficient index structure for approximate nearest neighbor search

Approximate Nearest Neighbor (ANN) search in high dimensional space has become a fundamental paradigm in many applications. Recently, Locality Sensitive Hashing (LSH) and its variants are acknowledged as the most promising solutions to ANN search. However, state-of-the-art LSH approaches suffer from...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the VLDB Endowment 2014-05, Vol.7 (9), p.745-756
Main Authors: Liu, Yingfan, Cui, Jiangtao, Huang, Zi, Li, Hui, Shen, Heng Tao
Format: Article
Language:English
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c196t-5bc2ffacdf239d129e7957bc4e343d370f37aa1609d1727ef0be6c75ec9c17013
container_end_page 756
container_issue 9
container_start_page 745
container_title Proceedings of the VLDB Endowment
container_volume 7
creator Liu, Yingfan
Cui, Jiangtao
Huang, Zi
Li, Hui
Shen, Heng Tao
description Approximate Nearest Neighbor (ANN) search in high dimensional space has become a fundamental paradigm in many applications. Recently, Locality Sensitive Hashing (LSH) and its variants are acknowledged as the most promising solutions to ANN search. However, state-of-the-art LSH approaches suffer from a drawback: accesses to candidate objects require a large number of random I/O operations. In order to guarantee the quality of returned results, sufficient objects should be verified, which would consume enormous I/O cost. To address this issue, we propose a novel method, called SortingKeys-LSH (SK-LSH), which reduces the number of page accesses through locally arranging candidate objects. We firstly define a new measure to evaluate the distance between the compound hash keys of two points. A linear order relationship on the set of compound hash keys is then created, and the corresponding data points can be sorted accordingly. Hence, data points that are close to each other according to the distance measure can be stored locally in an index file. During the ANN search, only a limited number of disk pages among few index files are necessary to be accessed for sufficient candidate generation and verification, which not only significantly reduces the response time but also improves the accuracy of the returned results. Our exhaustive empirical study over several real-world data sets demonstrates the superior efficiency and accuracy of SK-LSH for the ANN search, compared with state-of-the-art methods, including LSB, C2LSH and CK-Means.
doi_str_mv 10.14778/2732939.2732947
format article
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_14778_2732939_2732947</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_14778_2732939_2732947</sourcerecordid><originalsourceid>FETCH-LOGICAL-c196t-5bc2ffacdf239d129e7957bc4e343d370f37aa1609d1727ef0be6c75ec9c17013</originalsourceid><addsrcrecordid>eNpNj0uLwjAURsOgg49x75-I3pvb5jZLKb6YwizUdUjTBEZmUBo3_nulduHqfPDBgSPEHGGBGXOxVEzKkFl0zPhDjBXmIAswPHjbIzFJ6QygC43FWHwevmV12H2JYXR_Kcx6TsVpsz6WO1n9bPflqpIejb7JvPYqRuebqMg0qExgk3Pts0AZNcQQiZ1DDc-TFYcIddCe8-CNRwakqYCX17eXlNoQ7bX9_Xft3SLYrsP2HbbvoAeqtTfo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>SK-LSH: an efficient index structure for approximate nearest neighbor search</title><source>Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)</source><creator>Liu, Yingfan ; Cui, Jiangtao ; Huang, Zi ; Li, Hui ; Shen, Heng Tao</creator><creatorcontrib>Liu, Yingfan ; Cui, Jiangtao ; Huang, Zi ; Li, Hui ; Shen, Heng Tao</creatorcontrib><description>Approximate Nearest Neighbor (ANN) search in high dimensional space has become a fundamental paradigm in many applications. Recently, Locality Sensitive Hashing (LSH) and its variants are acknowledged as the most promising solutions to ANN search. However, state-of-the-art LSH approaches suffer from a drawback: accesses to candidate objects require a large number of random I/O operations. In order to guarantee the quality of returned results, sufficient objects should be verified, which would consume enormous I/O cost. To address this issue, we propose a novel method, called SortingKeys-LSH (SK-LSH), which reduces the number of page accesses through locally arranging candidate objects. We firstly define a new measure to evaluate the distance between the compound hash keys of two points. A linear order relationship on the set of compound hash keys is then created, and the corresponding data points can be sorted accordingly. Hence, data points that are close to each other according to the distance measure can be stored locally in an index file. During the ANN search, only a limited number of disk pages among few index files are necessary to be accessed for sufficient candidate generation and verification, which not only significantly reduces the response time but also improves the accuracy of the returned results. Our exhaustive empirical study over several real-world data sets demonstrates the superior efficiency and accuracy of SK-LSH for the ANN search, compared with state-of-the-art methods, including LSB, C2LSH and CK-Means.</description><identifier>ISSN: 2150-8097</identifier><identifier>EISSN: 2150-8097</identifier><identifier>DOI: 10.14778/2732939.2732947</identifier><language>eng</language><ispartof>Proceedings of the VLDB Endowment, 2014-05, Vol.7 (9), p.745-756</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c196t-5bc2ffacdf239d129e7957bc4e343d370f37aa1609d1727ef0be6c75ec9c17013</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Liu, Yingfan</creatorcontrib><creatorcontrib>Cui, Jiangtao</creatorcontrib><creatorcontrib>Huang, Zi</creatorcontrib><creatorcontrib>Li, Hui</creatorcontrib><creatorcontrib>Shen, Heng Tao</creatorcontrib><title>SK-LSH: an efficient index structure for approximate nearest neighbor search</title><title>Proceedings of the VLDB Endowment</title><description>Approximate Nearest Neighbor (ANN) search in high dimensional space has become a fundamental paradigm in many applications. Recently, Locality Sensitive Hashing (LSH) and its variants are acknowledged as the most promising solutions to ANN search. However, state-of-the-art LSH approaches suffer from a drawback: accesses to candidate objects require a large number of random I/O operations. In order to guarantee the quality of returned results, sufficient objects should be verified, which would consume enormous I/O cost. To address this issue, we propose a novel method, called SortingKeys-LSH (SK-LSH), which reduces the number of page accesses through locally arranging candidate objects. We firstly define a new measure to evaluate the distance between the compound hash keys of two points. A linear order relationship on the set of compound hash keys is then created, and the corresponding data points can be sorted accordingly. Hence, data points that are close to each other according to the distance measure can be stored locally in an index file. During the ANN search, only a limited number of disk pages among few index files are necessary to be accessed for sufficient candidate generation and verification, which not only significantly reduces the response time but also improves the accuracy of the returned results. Our exhaustive empirical study over several real-world data sets demonstrates the superior efficiency and accuracy of SK-LSH for the ANN search, compared with state-of-the-art methods, including LSB, C2LSH and CK-Means.</description><issn>2150-8097</issn><issn>2150-8097</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><recordid>eNpNj0uLwjAURsOgg49x75-I3pvb5jZLKb6YwizUdUjTBEZmUBo3_nulduHqfPDBgSPEHGGBGXOxVEzKkFl0zPhDjBXmIAswPHjbIzFJ6QygC43FWHwevmV12H2JYXR_Kcx6TsVpsz6WO1n9bPflqpIejb7JvPYqRuebqMg0qExgk3Pts0AZNcQQiZ1DDc-TFYcIddCe8-CNRwakqYCX17eXlNoQ7bX9_Xft3SLYrsP2HbbvoAeqtTfo</recordid><startdate>20140501</startdate><enddate>20140501</enddate><creator>Liu, Yingfan</creator><creator>Cui, Jiangtao</creator><creator>Huang, Zi</creator><creator>Li, Hui</creator><creator>Shen, Heng Tao</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20140501</creationdate><title>SK-LSH</title><author>Liu, Yingfan ; Cui, Jiangtao ; Huang, Zi ; Li, Hui ; Shen, Heng Tao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c196t-5bc2ffacdf239d129e7957bc4e343d370f37aa1609d1727ef0be6c75ec9c17013</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Yingfan</creatorcontrib><creatorcontrib>Cui, Jiangtao</creatorcontrib><creatorcontrib>Huang, Zi</creatorcontrib><creatorcontrib>Li, Hui</creatorcontrib><creatorcontrib>Shen, Heng Tao</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of the VLDB Endowment</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Yingfan</au><au>Cui, Jiangtao</au><au>Huang, Zi</au><au>Li, Hui</au><au>Shen, Heng Tao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SK-LSH: an efficient index structure for approximate nearest neighbor search</atitle><jtitle>Proceedings of the VLDB Endowment</jtitle><date>2014-05-01</date><risdate>2014</risdate><volume>7</volume><issue>9</issue><spage>745</spage><epage>756</epage><pages>745-756</pages><issn>2150-8097</issn><eissn>2150-8097</eissn><abstract>Approximate Nearest Neighbor (ANN) search in high dimensional space has become a fundamental paradigm in many applications. Recently, Locality Sensitive Hashing (LSH) and its variants are acknowledged as the most promising solutions to ANN search. However, state-of-the-art LSH approaches suffer from a drawback: accesses to candidate objects require a large number of random I/O operations. In order to guarantee the quality of returned results, sufficient objects should be verified, which would consume enormous I/O cost. To address this issue, we propose a novel method, called SortingKeys-LSH (SK-LSH), which reduces the number of page accesses through locally arranging candidate objects. We firstly define a new measure to evaluate the distance between the compound hash keys of two points. A linear order relationship on the set of compound hash keys is then created, and the corresponding data points can be sorted accordingly. Hence, data points that are close to each other according to the distance measure can be stored locally in an index file. During the ANN search, only a limited number of disk pages among few index files are necessary to be accessed for sufficient candidate generation and verification, which not only significantly reduces the response time but also improves the accuracy of the returned results. Our exhaustive empirical study over several real-world data sets demonstrates the superior efficiency and accuracy of SK-LSH for the ANN search, compared with state-of-the-art methods, including LSB, C2LSH and CK-Means.</abstract><doi>10.14778/2732939.2732947</doi><tpages>12</tpages></addata></record>
fulltext fulltext
identifier ISSN: 2150-8097
ispartof Proceedings of the VLDB Endowment, 2014-05, Vol.7 (9), p.745-756
issn 2150-8097
2150-8097
language eng
recordid cdi_crossref_primary_10_14778_2732939_2732947
source Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)
title SK-LSH: an efficient index structure for approximate nearest neighbor search
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T17%3A50%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SK-LSH:%20an%20efficient%20index%20structure%20for%20approximate%20nearest%20neighbor%20search&rft.jtitle=Proceedings%20of%20the%20VLDB%20Endowment&rft.au=Liu,%20Yingfan&rft.date=2014-05-01&rft.volume=7&rft.issue=9&rft.spage=745&rft.epage=756&rft.pages=745-756&rft.issn=2150-8097&rft.eissn=2150-8097&rft_id=info:doi/10.14778/2732939.2732947&rft_dat=%3Ccrossref%3E10_14778_2732939_2732947%3C/crossref%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c196t-5bc2ffacdf239d129e7957bc4e343d370f37aa1609d1727ef0be6c75ec9c17013%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true