Loading…

Efficient algorithms to execute complex similarity queries in RDBMS

Search operations in large sets of complex objects usually rely on similarity-based criteria, due to the lack of other general properties that could be used to compare the objects, such as the total order relationship, or even the equality relationship between pairs of objects, commonly used with da...

Full description

Saved in:
Bibliographic Details
Published in:Journal of the Brazilian Computer Society 2004-04, Vol.9 (3), p.5-24
Main Authors: Arantes, Adriano S., Vieira, Marcos R., Traina Jr, Caetano, Traina, Agma J. M.
Format: Article
Language:eng ; por
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Search operations in large sets of complex objects usually rely on similarity-based criteria, due to the lack of other general properties that could be used to compare the objects, such as the total order relationship, or even the equality relationship between pairs of objects, commonly used with data in numeric or short texts domains. Therefore, similarity between objects is the core criterion to compare complex objects. There are two basic operators for similarity queries: Range Query and k-Nearest Neighbors Query. Much research has been done to develop effective algorithms to implement them as standalone operations. However, algorithms to support these operators as parts of more complex expressions involving their composition were not developed yet. This paper presents two new algorithms specially designed to answer conjunctive and disjunctive operations involving the basic similarity criteria, providing also support for the manipulation of tie lists when the k-Nearest Neighbor query is involved. The new proposed algorithms were compared with the combinations of the basic algorithms, both in the sequential scan and in the Slim-tree metric access methods, measuring the number of disk accesses, the number of distance calculations, and wall-clock time. The experimental results show that the new algorithms have better performance than the composition of the two basic operators to answer complex similarity queries in all measured aspects, being up to 40 times faster than the composition of the basic algorithms. This is an essential point to enable the practical use of similarity operators in Relational Database Management Systems.
ISSN:0104-6500
1678-4804
DOI:10.1590/S0104-65002004000100002