Loading…

Accurate summary-based cardinality estimation through the lens of cardinality estimation graphs

This paper is an experimental and analytical study of two classes of summary-based cardinality estimators that use statistics about input relations and small-size joins in the context of graph database management systems: (i) optimistic estimators that make uniformity and conditional independence as...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the VLDB Endowment 2022-04, Vol.15 (8), p.1533-1545
Main Authors: Chen, Jeremy, Huang, Yuqing, Wang, Mushi, Salihoglu, Semih, Salem, Ken
Format: Article
Language:English
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper is an experimental and analytical study of two classes of summary-based cardinality estimators that use statistics about input relations and small-size joins in the context of graph database management systems: (i) optimistic estimators that make uniformity and conditional independence assumptions; and (ii) the recent pessimistic estimators that use information theoretic linear programs (LPs). We begin by analyzing how optimistic estimators use pre-computed statistics to generate cardinality estimates. We show these estimators can be modeled as picking bottom-to-top paths in a cardinality estimation graph (CEG), which contains sub-queries as nodes and edges whose weights are average degree statistics. We show that existing optimistic estimators have either undefined or fixed choices for picking CEG paths as their estimates and ignore alternative choices. Instead, we outline a space of optimistic estimators to make an estimate on CEGs, which subsumes existing estimators. We show, using an extensive empirical analysis, that effective paths depend on the structure of the queries. While on acyclic queries and queries with small-size cycles, using the maximum-weight path is effective to address the well known underestimation problem, on queries with larger cycles these estimates tend to overestimate, which can be addressed by using minimum weight paths. We next show that optimistic estimators and seemingly disparate LP-based pessimistic estimators are in fact connected. Specifically, we show that CEGs can also model some recent pessimistic estimators. This connection allows us to adopt an optimization from pessimistic estimators to optimistic ones, and provide insights into the pessimistic estimators, such as showing that they have combinatorial solutions.
ISSN:2150-8097
2150-8097
DOI:10.14778/3529337.3529339