Loading…

Efficient construction of histograms for multidimensional data using quad-trees

Histograms can be useful in estimating the selectivity of queries in areas such as database query optimization and data exploration. In this paper, we propose a new histogram method for multidimensional data, called the Q-Histogram, based on the use of the quad-tree, which is a popular index structu...

Full description

Saved in:
Bibliographic Details
Published in:Decision Support Systems 2011-12, Vol.52 (1), p.82-94
Main Authors: Roh, Yohan J., Kim, Jae Ho, Son, Jin Hyun, Kim, Myoung Ho
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Histograms can be useful in estimating the selectivity of queries in areas such as database query optimization and data exploration. In this paper, we propose a new histogram method for multidimensional data, called the Q-Histogram, based on the use of the quad-tree, which is a popular index structure for multidimensional data sets. The use of the compact representation of the target data obtainable from the quad-tree allows a fast construction of a histogram with the minimum number of scanning, i.e., only one scanning, of the underlying data. In addition to the advantage of computation time, the proposed method also provides a better performance than other existing methods with respect to the quality of selectivity estimation. We present a new measure of data skew for a histogram bucket, called the weighted bucket skew. Then, we provide an effective technique for skew-tolerant organization of histograms. Finally, we compare the accuracy and efficiency of the proposed method with other existing methods using both real-life data sets and synthetic data sets. The results of experiments show that the proposed method generally provides a better performance than other existing methods in terms of accuracy as well as computational efficiency. ► Histograms can be useful in estimating the selectivity of range queries.► A new multidimensional histogram method based on the use of the quad-tree.► The use of the quad-tree can allow a fast construction of a histogram.► The proposed method generally shows better performance than other existing methods.
ISSN:0167-9236
1873-5797
DOI:10.1016/j.dss.2011.05.006