Loading…

Cardinality estimation via learned dynamic sample selection

Sampling is an effective approach to cardinality estimation which in turn is a key to query optimization in a DBMS. Although there have been a lot of studies on applying machine learning to cardinality estimation recently, enhancing sampling-based cardinality estimation by machine learning has been...

Full description

Saved in:
Bibliographic Details
Published in:Information systems (Oxford) 2023-07, Vol.117, p.102252, Article 102252
Main Authors: Wang, Run-An, Zou, Zhaonian, Jing, Ziqi
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c247t-68f3ebe95ead74459898dd32e90cee453f38a5591edf92b8f6076b49b7048b6c3
container_end_page
container_issue
container_start_page 102252
container_title Information systems (Oxford)
container_volume 117
creator Wang, Run-An
Zou, Zhaonian
Jing, Ziqi
description Sampling is an effective approach to cardinality estimation which in turn is a key to query optimization in a DBMS. Although there have been a lot of studies on applying machine learning to cardinality estimation recently, enhancing sampling-based cardinality estimation by machine learning has been overlooked for a long time. In this paper, we propose a new sampling-based cardinality estimation method called LDSS by developing a learning-based dynamic sample selection method. Unlike the existing sampling-based methods that perform online sampling for every query, our method selects for the query the most suitable sample from the set of samples of various sizes that have been materialized during preprocessing. The cardinality of the query is then estimated based on the selected sample. Since our method is based on sampling, it can handle both single-table queries and join queries. Due to dynamic sample selection, costly online sampling is completely avoided. By learning the complex relationships between samples and queries, our learned sample selector can recommend small yet good samples for input queries. The extensive evaluation performed on the benchmarks indicates that LDSS can be trained much faster and can achieve higher accuracy than the state-of-the-art query-dependent methods and comparable accuracy to the current data-driven methods. •We propose the first cardinality estimation framework using dynamic sample selection.•We propose LDSS, a learning-based approach to dynamic sample selection.•We develop a method to incrementally update the learned sample selection model.•We experimentally evaluated the performance of LDSS using a set of benchmarks.•LDSS can achieve better accuracy than the state-of-the-art query-dependent methods.
doi_str_mv 10.1016/j.is.2023.102252
format article
fullrecord <record><control><sourceid>elsevier_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1016_j_is_2023_102252</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0306437923000881</els_id><sourcerecordid>S0306437923000881</sourcerecordid><originalsourceid>FETCH-LOGICAL-c247t-68f3ebe95ead74459898dd32e90cee453f38a5591edf92b8f6076b49b7048b6c3</originalsourceid><addsrcrecordid>eNp1j01LxDAQQIMoWFfvHvsHWqf5aBM9SdFVWPCi55AmU0jpx5KUhf57W-rV0zAwb3iPkMcC8gKK8qnLfcwpULaulAp6RZJCViwroSqvSQIMyoyzSt2Suxg7AKBCqYS81CY4P5rez0uKcfaDmf00phdv0h5NGNGlbhnN4G0azXDuMY3Yo92O7slNa_qID3_zQH7e377rj-z0dfysX0-Zpbyas1K2DBtUAo2rOBdKKukco6jAInLBWiaNEKpA1yrayHZTbrhqKuCyKS07ENj_2jDFGLDV57B6hkUXoLd43Wkf9Rav9_gVed4RXL0uHoOO1uNo0fmwyms3-f_hXxZGYYk</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Cardinality estimation via learned dynamic sample selection</title><source>ScienceDirect Freedom Collection 2022-2024</source><creator>Wang, Run-An ; Zou, Zhaonian ; Jing, Ziqi</creator><creatorcontrib>Wang, Run-An ; Zou, Zhaonian ; Jing, Ziqi</creatorcontrib><description>Sampling is an effective approach to cardinality estimation which in turn is a key to query optimization in a DBMS. Although there have been a lot of studies on applying machine learning to cardinality estimation recently, enhancing sampling-based cardinality estimation by machine learning has been overlooked for a long time. In this paper, we propose a new sampling-based cardinality estimation method called LDSS by developing a learning-based dynamic sample selection method. Unlike the existing sampling-based methods that perform online sampling for every query, our method selects for the query the most suitable sample from the set of samples of various sizes that have been materialized during preprocessing. The cardinality of the query is then estimated based on the selected sample. Since our method is based on sampling, it can handle both single-table queries and join queries. Due to dynamic sample selection, costly online sampling is completely avoided. By learning the complex relationships between samples and queries, our learned sample selector can recommend small yet good samples for input queries. The extensive evaluation performed on the benchmarks indicates that LDSS can be trained much faster and can achieve higher accuracy than the state-of-the-art query-dependent methods and comparable accuracy to the current data-driven methods. •We propose the first cardinality estimation framework using dynamic sample selection.•We propose LDSS, a learning-based approach to dynamic sample selection.•We develop a method to incrementally update the learned sample selection model.•We experimentally evaluated the performance of LDSS using a set of benchmarks.•LDSS can achieve better accuracy than the state-of-the-art query-dependent methods.</description><identifier>ISSN: 0306-4379</identifier><identifier>EISSN: 1873-6076</identifier><identifier>DOI: 10.1016/j.is.2023.102252</identifier><language>eng</language><publisher>Elsevier Ltd</publisher><subject>Cardinality estimation ; Dynamic sample selection ; Neural network</subject><ispartof>Information systems (Oxford), 2023-07, Vol.117, p.102252, Article 102252</ispartof><rights>2023 Elsevier Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c247t-68f3ebe95ead74459898dd32e90cee453f38a5591edf92b8f6076b49b7048b6c3</cites><orcidid>0000-0001-9475-8944</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Wang, Run-An</creatorcontrib><creatorcontrib>Zou, Zhaonian</creatorcontrib><creatorcontrib>Jing, Ziqi</creatorcontrib><title>Cardinality estimation via learned dynamic sample selection</title><title>Information systems (Oxford)</title><description>Sampling is an effective approach to cardinality estimation which in turn is a key to query optimization in a DBMS. Although there have been a lot of studies on applying machine learning to cardinality estimation recently, enhancing sampling-based cardinality estimation by machine learning has been overlooked for a long time. In this paper, we propose a new sampling-based cardinality estimation method called LDSS by developing a learning-based dynamic sample selection method. Unlike the existing sampling-based methods that perform online sampling for every query, our method selects for the query the most suitable sample from the set of samples of various sizes that have been materialized during preprocessing. The cardinality of the query is then estimated based on the selected sample. Since our method is based on sampling, it can handle both single-table queries and join queries. Due to dynamic sample selection, costly online sampling is completely avoided. By learning the complex relationships between samples and queries, our learned sample selector can recommend small yet good samples for input queries. The extensive evaluation performed on the benchmarks indicates that LDSS can be trained much faster and can achieve higher accuracy than the state-of-the-art query-dependent methods and comparable accuracy to the current data-driven methods. •We propose the first cardinality estimation framework using dynamic sample selection.•We propose LDSS, a learning-based approach to dynamic sample selection.•We develop a method to incrementally update the learned sample selection model.•We experimentally evaluated the performance of LDSS using a set of benchmarks.•LDSS can achieve better accuracy than the state-of-the-art query-dependent methods.</description><subject>Cardinality estimation</subject><subject>Dynamic sample selection</subject><subject>Neural network</subject><issn>0306-4379</issn><issn>1873-6076</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp1j01LxDAQQIMoWFfvHvsHWqf5aBM9SdFVWPCi55AmU0jpx5KUhf57W-rV0zAwb3iPkMcC8gKK8qnLfcwpULaulAp6RZJCViwroSqvSQIMyoyzSt2Suxg7AKBCqYS81CY4P5rez0uKcfaDmf00phdv0h5NGNGlbhnN4G0azXDuMY3Yo92O7slNa_qID3_zQH7e377rj-z0dfysX0-Zpbyas1K2DBtUAo2rOBdKKukco6jAInLBWiaNEKpA1yrayHZTbrhqKuCyKS07ENj_2jDFGLDV57B6hkUXoLd43Wkf9Rav9_gVed4RXL0uHoOO1uNo0fmwyms3-f_hXxZGYYk</recordid><startdate>202307</startdate><enddate>202307</enddate><creator>Wang, Run-An</creator><creator>Zou, Zhaonian</creator><creator>Jing, Ziqi</creator><general>Elsevier Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-9475-8944</orcidid></search><sort><creationdate>202307</creationdate><title>Cardinality estimation via learned dynamic sample selection</title><author>Wang, Run-An ; Zou, Zhaonian ; Jing, Ziqi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c247t-68f3ebe95ead74459898dd32e90cee453f38a5591edf92b8f6076b49b7048b6c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Cardinality estimation</topic><topic>Dynamic sample selection</topic><topic>Neural network</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Run-An</creatorcontrib><creatorcontrib>Zou, Zhaonian</creatorcontrib><creatorcontrib>Jing, Ziqi</creatorcontrib><collection>CrossRef</collection><jtitle>Information systems (Oxford)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Run-An</au><au>Zou, Zhaonian</au><au>Jing, Ziqi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Cardinality estimation via learned dynamic sample selection</atitle><jtitle>Information systems (Oxford)</jtitle><date>2023-07</date><risdate>2023</risdate><volume>117</volume><spage>102252</spage><pages>102252-</pages><artnum>102252</artnum><issn>0306-4379</issn><eissn>1873-6076</eissn><abstract>Sampling is an effective approach to cardinality estimation which in turn is a key to query optimization in a DBMS. Although there have been a lot of studies on applying machine learning to cardinality estimation recently, enhancing sampling-based cardinality estimation by machine learning has been overlooked for a long time. In this paper, we propose a new sampling-based cardinality estimation method called LDSS by developing a learning-based dynamic sample selection method. Unlike the existing sampling-based methods that perform online sampling for every query, our method selects for the query the most suitable sample from the set of samples of various sizes that have been materialized during preprocessing. The cardinality of the query is then estimated based on the selected sample. Since our method is based on sampling, it can handle both single-table queries and join queries. Due to dynamic sample selection, costly online sampling is completely avoided. By learning the complex relationships between samples and queries, our learned sample selector can recommend small yet good samples for input queries. The extensive evaluation performed on the benchmarks indicates that LDSS can be trained much faster and can achieve higher accuracy than the state-of-the-art query-dependent methods and comparable accuracy to the current data-driven methods. •We propose the first cardinality estimation framework using dynamic sample selection.•We propose LDSS, a learning-based approach to dynamic sample selection.•We develop a method to incrementally update the learned sample selection model.•We experimentally evaluated the performance of LDSS using a set of benchmarks.•LDSS can achieve better accuracy than the state-of-the-art query-dependent methods.</abstract><pub>Elsevier Ltd</pub><doi>10.1016/j.is.2023.102252</doi><orcidid>https://orcid.org/0000-0001-9475-8944</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0306-4379
ispartof Information systems (Oxford), 2023-07, Vol.117, p.102252, Article 102252
issn 0306-4379
1873-6076
language eng
recordid cdi_crossref_primary_10_1016_j_is_2023_102252
source ScienceDirect Freedom Collection 2022-2024
subjects Cardinality estimation
Dynamic sample selection
Neural network
title Cardinality estimation via learned dynamic sample selection
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T08%3A14%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-elsevier_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Cardinality%20estimation%20via%20learned%20dynamic%20sample%20selection&rft.jtitle=Information%20systems%20(Oxford)&rft.au=Wang,%20Run-An&rft.date=2023-07&rft.volume=117&rft.spage=102252&rft.pages=102252-&rft.artnum=102252&rft.issn=0306-4379&rft.eissn=1873-6076&rft_id=info:doi/10.1016/j.is.2023.102252&rft_dat=%3Celsevier_cross%3ES0306437923000881%3C/elsevier_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c247t-68f3ebe95ead74459898dd32e90cee453f38a5591edf92b8f6076b49b7048b6c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true