Loading…

Deep unsupervised cardinality estimation

Cardinality estimation has long been grounded in statistical tools for density estimation. To capture the rich multivariate distributions of relational tables, we propose the use of a new type of high-capacity statistical model: deep autoregressive models. However, direct application of these models...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the VLDB Endowment 2019-11, Vol.13 (3), p.279-292
Main Authors: Yang, Zongheng, Liang, Eric, Kamsetty, Amog, Wu, Chenggang, Duan, Yan, Chen, Xi, Abbeel, Pieter, Hellerstein, Joseph M., Krishnan, Sanjay, Stoica, Ion
Format: Article
Language:English
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c243t-9b8fe2041ce8e7766ec6eb1d8edc6852ff92d543a5a15032223b0e0487d782873
cites cdi_FETCH-LOGICAL-c243t-9b8fe2041ce8e7766ec6eb1d8edc6852ff92d543a5a15032223b0e0487d782873
container_end_page 292
container_issue 3
container_start_page 279
container_title Proceedings of the VLDB Endowment
container_volume 13
creator Yang, Zongheng
Liang, Eric
Kamsetty, Amog
Wu, Chenggang
Duan, Yan
Chen, Xi
Abbeel, Pieter
Hellerstein, Joseph M.
Krishnan, Sanjay
Stoica, Ion
description Cardinality estimation has long been grounded in statistical tools for density estimation. To capture the rich multivariate distributions of relational tables, we propose the use of a new type of high-capacity statistical model: deep autoregressive models. However, direct application of these models leads to a limited estimator that is prohibitively expensive to evaluate for range or wildcard predicates. To produce a truly usable estimator, we develop a Monte Carlo integration scheme on top of autoregressive models that can efficiently handle range queries with dozens of dimensions or more. Like classical synopses, our estimator summarizes the data without supervision. Unlike previous solutions, we approximate the joint data distribution without any independence assumptions. Evaluated on real-world datasets and compared against real systems and dominant families of techniques, our estimator achieves single-digit multiplicative error at tail, an up to 90x accuracy improvement over the second best method, and is space- and runtime-efficient.
doi_str_mv 10.14778/3368289.3368294
format article
fullrecord <record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_14778_3368289_3368294</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_14778_3368289_3368294</sourcerecordid><originalsourceid>FETCH-LOGICAL-c243t-9b8fe2041ce8e7766ec6eb1d8edc6852ff92d543a5a15032223b0e0487d782873</originalsourceid><addsrcrecordid>eNpNjztPwzAURi0EEqVlZ8zIknL9iH09ovKUKrHQOXLsa8mopJGdIvXfE5UMTOebPp3D2B2HNVfG4IOUGgXa9ZlWXbCF4A3UCNZc_tvX7KaULwCNmuOC3T8RDdWxL8eB8k8qFCrvcki926fxVFEZ07cb06Ffsavo9oVuZy7Z7uX5c_NWbz9e3zeP29oLJcfadhhJgOKekIzRmrymjgek4DU2IkYrQqOka9xkJIUQsgMChSaYSd_IJYO_X58PpWSK7ZAnhXxqObTn0nYubedS-QvsHkWG</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Deep unsupervised cardinality estimation</title><source>Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)</source><creator>Yang, Zongheng ; Liang, Eric ; Kamsetty, Amog ; Wu, Chenggang ; Duan, Yan ; Chen, Xi ; Abbeel, Pieter ; Hellerstein, Joseph M. ; Krishnan, Sanjay ; Stoica, Ion</creator><creatorcontrib>Yang, Zongheng ; Liang, Eric ; Kamsetty, Amog ; Wu, Chenggang ; Duan, Yan ; Chen, Xi ; Abbeel, Pieter ; Hellerstein, Joseph M. ; Krishnan, Sanjay ; Stoica, Ion</creatorcontrib><description>Cardinality estimation has long been grounded in statistical tools for density estimation. To capture the rich multivariate distributions of relational tables, we propose the use of a new type of high-capacity statistical model: deep autoregressive models. However, direct application of these models leads to a limited estimator that is prohibitively expensive to evaluate for range or wildcard predicates. To produce a truly usable estimator, we develop a Monte Carlo integration scheme on top of autoregressive models that can efficiently handle range queries with dozens of dimensions or more. Like classical synopses, our estimator summarizes the data without supervision. Unlike previous solutions, we approximate the joint data distribution without any independence assumptions. Evaluated on real-world datasets and compared against real systems and dominant families of techniques, our estimator achieves single-digit multiplicative error at tail, an up to 90x accuracy improvement over the second best method, and is space- and runtime-efficient.</description><identifier>ISSN: 2150-8097</identifier><identifier>EISSN: 2150-8097</identifier><identifier>DOI: 10.14778/3368289.3368294</identifier><language>eng</language><ispartof>Proceedings of the VLDB Endowment, 2019-11, Vol.13 (3), p.279-292</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c243t-9b8fe2041ce8e7766ec6eb1d8edc6852ff92d543a5a15032223b0e0487d782873</citedby><cites>FETCH-LOGICAL-c243t-9b8fe2041ce8e7766ec6eb1d8edc6852ff92d543a5a15032223b0e0487d782873</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,777,781,27905,27906</link.rule.ids></links><search><creatorcontrib>Yang, Zongheng</creatorcontrib><creatorcontrib>Liang, Eric</creatorcontrib><creatorcontrib>Kamsetty, Amog</creatorcontrib><creatorcontrib>Wu, Chenggang</creatorcontrib><creatorcontrib>Duan, Yan</creatorcontrib><creatorcontrib>Chen, Xi</creatorcontrib><creatorcontrib>Abbeel, Pieter</creatorcontrib><creatorcontrib>Hellerstein, Joseph M.</creatorcontrib><creatorcontrib>Krishnan, Sanjay</creatorcontrib><creatorcontrib>Stoica, Ion</creatorcontrib><title>Deep unsupervised cardinality estimation</title><title>Proceedings of the VLDB Endowment</title><description>Cardinality estimation has long been grounded in statistical tools for density estimation. To capture the rich multivariate distributions of relational tables, we propose the use of a new type of high-capacity statistical model: deep autoregressive models. However, direct application of these models leads to a limited estimator that is prohibitively expensive to evaluate for range or wildcard predicates. To produce a truly usable estimator, we develop a Monte Carlo integration scheme on top of autoregressive models that can efficiently handle range queries with dozens of dimensions or more. Like classical synopses, our estimator summarizes the data without supervision. Unlike previous solutions, we approximate the joint data distribution without any independence assumptions. Evaluated on real-world datasets and compared against real systems and dominant families of techniques, our estimator achieves single-digit multiplicative error at tail, an up to 90x accuracy improvement over the second best method, and is space- and runtime-efficient.</description><issn>2150-8097</issn><issn>2150-8097</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNpNjztPwzAURi0EEqVlZ8zIknL9iH09ovKUKrHQOXLsa8mopJGdIvXfE5UMTOebPp3D2B2HNVfG4IOUGgXa9ZlWXbCF4A3UCNZc_tvX7KaULwCNmuOC3T8RDdWxL8eB8k8qFCrvcki926fxVFEZ07cb06Ffsavo9oVuZy7Z7uX5c_NWbz9e3zeP29oLJcfadhhJgOKekIzRmrymjgek4DU2IkYrQqOka9xkJIUQsgMChSaYSd_IJYO_X58PpWSK7ZAnhXxqObTn0nYubedS-QvsHkWG</recordid><startdate>20191101</startdate><enddate>20191101</enddate><creator>Yang, Zongheng</creator><creator>Liang, Eric</creator><creator>Kamsetty, Amog</creator><creator>Wu, Chenggang</creator><creator>Duan, Yan</creator><creator>Chen, Xi</creator><creator>Abbeel, Pieter</creator><creator>Hellerstein, Joseph M.</creator><creator>Krishnan, Sanjay</creator><creator>Stoica, Ion</creator><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20191101</creationdate><title>Deep unsupervised cardinality estimation</title><author>Yang, Zongheng ; Liang, Eric ; Kamsetty, Amog ; Wu, Chenggang ; Duan, Yan ; Chen, Xi ; Abbeel, Pieter ; Hellerstein, Joseph M. ; Krishnan, Sanjay ; Stoica, Ion</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c243t-9b8fe2041ce8e7766ec6eb1d8edc6852ff92d543a5a15032223b0e0487d782873</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yang, Zongheng</creatorcontrib><creatorcontrib>Liang, Eric</creatorcontrib><creatorcontrib>Kamsetty, Amog</creatorcontrib><creatorcontrib>Wu, Chenggang</creatorcontrib><creatorcontrib>Duan, Yan</creatorcontrib><creatorcontrib>Chen, Xi</creatorcontrib><creatorcontrib>Abbeel, Pieter</creatorcontrib><creatorcontrib>Hellerstein, Joseph M.</creatorcontrib><creatorcontrib>Krishnan, Sanjay</creatorcontrib><creatorcontrib>Stoica, Ion</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of the VLDB Endowment</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yang, Zongheng</au><au>Liang, Eric</au><au>Kamsetty, Amog</au><au>Wu, Chenggang</au><au>Duan, Yan</au><au>Chen, Xi</au><au>Abbeel, Pieter</au><au>Hellerstein, Joseph M.</au><au>Krishnan, Sanjay</au><au>Stoica, Ion</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep unsupervised cardinality estimation</atitle><jtitle>Proceedings of the VLDB Endowment</jtitle><date>2019-11-01</date><risdate>2019</risdate><volume>13</volume><issue>3</issue><spage>279</spage><epage>292</epage><pages>279-292</pages><issn>2150-8097</issn><eissn>2150-8097</eissn><abstract>Cardinality estimation has long been grounded in statistical tools for density estimation. To capture the rich multivariate distributions of relational tables, we propose the use of a new type of high-capacity statistical model: deep autoregressive models. However, direct application of these models leads to a limited estimator that is prohibitively expensive to evaluate for range or wildcard predicates. To produce a truly usable estimator, we develop a Monte Carlo integration scheme on top of autoregressive models that can efficiently handle range queries with dozens of dimensions or more. Like classical synopses, our estimator summarizes the data without supervision. Unlike previous solutions, we approximate the joint data distribution without any independence assumptions. Evaluated on real-world datasets and compared against real systems and dominant families of techniques, our estimator achieves single-digit multiplicative error at tail, an up to 90x accuracy improvement over the second best method, and is space- and runtime-efficient.</abstract><doi>10.14778/3368289.3368294</doi><tpages>14</tpages></addata></record>
fulltext fulltext
identifier ISSN: 2150-8097
ispartof Proceedings of the VLDB Endowment, 2019-11, Vol.13 (3), p.279-292
issn 2150-8097
2150-8097
language eng
recordid cdi_crossref_primary_10_14778_3368289_3368294
source Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)
title Deep unsupervised cardinality estimation
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T00%3A41%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20unsupervised%20cardinality%20estimation&rft.jtitle=Proceedings%20of%20the%20VLDB%20Endowment&rft.au=Yang,%20Zongheng&rft.date=2019-11-01&rft.volume=13&rft.issue=3&rft.spage=279&rft.epage=292&rft.pages=279-292&rft.issn=2150-8097&rft.eissn=2150-8097&rft_id=info:doi/10.14778/3368289.3368294&rft_dat=%3Ccrossref%3E10_14778_3368289_3368294%3C/crossref%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c243t-9b8fe2041ce8e7766ec6eb1d8edc6852ff92d543a5a15032223b0e0487d782873%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true