Loading…
Cardinality estimation via learned dynamic sample selection
Sampling is an effective approach to cardinality estimation which in turn is a key to query optimization in a DBMS. Although there have been a lot of studies on applying machine learning to cardinality estimation recently, enhancing sampling-based cardinality estimation by machine learning has been...
Saved in:
Published in: | Information systems (Oxford) 2023-07, Vol.117, p.102252, Article 102252 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c247t-68f3ebe95ead74459898dd32e90cee453f38a5591edf92b8f6076b49b7048b6c3 |
container_end_page | |
container_issue | |
container_start_page | 102252 |
container_title | Information systems (Oxford) |
container_volume | 117 |
creator | Wang, Run-An Zou, Zhaonian Jing, Ziqi |
description | Sampling is an effective approach to cardinality estimation which in turn is a key to query optimization in a DBMS. Although there have been a lot of studies on applying machine learning to cardinality estimation recently, enhancing sampling-based cardinality estimation by machine learning has been overlooked for a long time. In this paper, we propose a new sampling-based cardinality estimation method called LDSS by developing a learning-based dynamic sample selection method. Unlike the existing sampling-based methods that perform online sampling for every query, our method selects for the query the most suitable sample from the set of samples of various sizes that have been materialized during preprocessing. The cardinality of the query is then estimated based on the selected sample. Since our method is based on sampling, it can handle both single-table queries and join queries. Due to dynamic sample selection, costly online sampling is completely avoided. By learning the complex relationships between samples and queries, our learned sample selector can recommend small yet good samples for input queries. The extensive evaluation performed on the benchmarks indicates that LDSS can be trained much faster and can achieve higher accuracy than the state-of-the-art query-dependent methods and comparable accuracy to the current data-driven methods.
•We propose the first cardinality estimation framework using dynamic sample selection.•We propose LDSS, a learning-based approach to dynamic sample selection.•We develop a method to incrementally update the learned sample selection model.•We experimentally evaluated the performance of LDSS using a set of benchmarks.•LDSS can achieve better accuracy than the state-of-the-art query-dependent methods. |
doi_str_mv | 10.1016/j.is.2023.102252 |
format | article |
fullrecord | <record><control><sourceid>elsevier_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1016_j_is_2023_102252</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0306437923000881</els_id><sourcerecordid>S0306437923000881</sourcerecordid><originalsourceid>FETCH-LOGICAL-c247t-68f3ebe95ead74459898dd32e90cee453f38a5591edf92b8f6076b49b7048b6c3</originalsourceid><addsrcrecordid>eNp1j01LxDAQQIMoWFfvHvsHWqf5aBM9SdFVWPCi55AmU0jpx5KUhf57W-rV0zAwb3iPkMcC8gKK8qnLfcwpULaulAp6RZJCViwroSqvSQIMyoyzSt2Suxg7AKBCqYS81CY4P5rez0uKcfaDmf00phdv0h5NGNGlbhnN4G0azXDuMY3Yo92O7slNa_qID3_zQH7e377rj-z0dfysX0-Zpbyas1K2DBtUAo2rOBdKKukco6jAInLBWiaNEKpA1yrayHZTbrhqKuCyKS07ENj_2jDFGLDV57B6hkUXoLd43Wkf9Rav9_gVed4RXL0uHoOO1uNo0fmwyms3-f_hXxZGYYk</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Cardinality estimation via learned dynamic sample selection</title><source>ScienceDirect Freedom Collection 2022-2024</source><creator>Wang, Run-An ; Zou, Zhaonian ; Jing, Ziqi</creator><creatorcontrib>Wang, Run-An ; Zou, Zhaonian ; Jing, Ziqi</creatorcontrib><description>Sampling is an effective approach to cardinality estimation which in turn is a key to query optimization in a DBMS. Although there have been a lot of studies on applying machine learning to cardinality estimation recently, enhancing sampling-based cardinality estimation by machine learning has been overlooked for a long time. In this paper, we propose a new sampling-based cardinality estimation method called LDSS by developing a learning-based dynamic sample selection method. Unlike the existing sampling-based methods that perform online sampling for every query, our method selects for the query the most suitable sample from the set of samples of various sizes that have been materialized during preprocessing. The cardinality of the query is then estimated based on the selected sample. Since our method is based on sampling, it can handle both single-table queries and join queries. Due to dynamic sample selection, costly online sampling is completely avoided. By learning the complex relationships between samples and queries, our learned sample selector can recommend small yet good samples for input queries. The extensive evaluation performed on the benchmarks indicates that LDSS can be trained much faster and can achieve higher accuracy than the state-of-the-art query-dependent methods and comparable accuracy to the current data-driven methods.
•We propose the first cardinality estimation framework using dynamic sample selection.•We propose LDSS, a learning-based approach to dynamic sample selection.•We develop a method to incrementally update the learned sample selection model.•We experimentally evaluated the performance of LDSS using a set of benchmarks.•LDSS can achieve better accuracy than the state-of-the-art query-dependent methods.</description><identifier>ISSN: 0306-4379</identifier><identifier>EISSN: 1873-6076</identifier><identifier>DOI: 10.1016/j.is.2023.102252</identifier><language>eng</language><publisher>Elsevier Ltd</publisher><subject>Cardinality estimation ; Dynamic sample selection ; Neural network</subject><ispartof>Information systems (Oxford), 2023-07, Vol.117, p.102252, Article 102252</ispartof><rights>2023 Elsevier Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c247t-68f3ebe95ead74459898dd32e90cee453f38a5591edf92b8f6076b49b7048b6c3</cites><orcidid>0000-0001-9475-8944</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Wang, Run-An</creatorcontrib><creatorcontrib>Zou, Zhaonian</creatorcontrib><creatorcontrib>Jing, Ziqi</creatorcontrib><title>Cardinality estimation via learned dynamic sample selection</title><title>Information systems (Oxford)</title><description>Sampling is an effective approach to cardinality estimation which in turn is a key to query optimization in a DBMS. Although there have been a lot of studies on applying machine learning to cardinality estimation recently, enhancing sampling-based cardinality estimation by machine learning has been overlooked for a long time. In this paper, we propose a new sampling-based cardinality estimation method called LDSS by developing a learning-based dynamic sample selection method. Unlike the existing sampling-based methods that perform online sampling for every query, our method selects for the query the most suitable sample from the set of samples of various sizes that have been materialized during preprocessing. The cardinality of the query is then estimated based on the selected sample. Since our method is based on sampling, it can handle both single-table queries and join queries. Due to dynamic sample selection, costly online sampling is completely avoided. By learning the complex relationships between samples and queries, our learned sample selector can recommend small yet good samples for input queries. The extensive evaluation performed on the benchmarks indicates that LDSS can be trained much faster and can achieve higher accuracy than the state-of-the-art query-dependent methods and comparable accuracy to the current data-driven methods.
•We propose the first cardinality estimation framework using dynamic sample selection.•We propose LDSS, a learning-based approach to dynamic sample selection.•We develop a method to incrementally update the learned sample selection model.•We experimentally evaluated the performance of LDSS using a set of benchmarks.•LDSS can achieve better accuracy than the state-of-the-art query-dependent methods.</description><subject>Cardinality estimation</subject><subject>Dynamic sample selection</subject><subject>Neural network</subject><issn>0306-4379</issn><issn>1873-6076</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp1j01LxDAQQIMoWFfvHvsHWqf5aBM9SdFVWPCi55AmU0jpx5KUhf57W-rV0zAwb3iPkMcC8gKK8qnLfcwpULaulAp6RZJCViwroSqvSQIMyoyzSt2Suxg7AKBCqYS81CY4P5rez0uKcfaDmf00phdv0h5NGNGlbhnN4G0azXDuMY3Yo92O7slNa_qID3_zQH7e377rj-z0dfysX0-Zpbyas1K2DBtUAo2rOBdKKukco6jAInLBWiaNEKpA1yrayHZTbrhqKuCyKS07ENj_2jDFGLDV57B6hkUXoLd43Wkf9Rav9_gVed4RXL0uHoOO1uNo0fmwyms3-f_hXxZGYYk</recordid><startdate>202307</startdate><enddate>202307</enddate><creator>Wang, Run-An</creator><creator>Zou, Zhaonian</creator><creator>Jing, Ziqi</creator><general>Elsevier Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-9475-8944</orcidid></search><sort><creationdate>202307</creationdate><title>Cardinality estimation via learned dynamic sample selection</title><author>Wang, Run-An ; Zou, Zhaonian ; Jing, Ziqi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c247t-68f3ebe95ead74459898dd32e90cee453f38a5591edf92b8f6076b49b7048b6c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Cardinality estimation</topic><topic>Dynamic sample selection</topic><topic>Neural network</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Run-An</creatorcontrib><creatorcontrib>Zou, Zhaonian</creatorcontrib><creatorcontrib>Jing, Ziqi</creatorcontrib><collection>CrossRef</collection><jtitle>Information systems (Oxford)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Run-An</au><au>Zou, Zhaonian</au><au>Jing, Ziqi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Cardinality estimation via learned dynamic sample selection</atitle><jtitle>Information systems (Oxford)</jtitle><date>2023-07</date><risdate>2023</risdate><volume>117</volume><spage>102252</spage><pages>102252-</pages><artnum>102252</artnum><issn>0306-4379</issn><eissn>1873-6076</eissn><abstract>Sampling is an effective approach to cardinality estimation which in turn is a key to query optimization in a DBMS. Although there have been a lot of studies on applying machine learning to cardinality estimation recently, enhancing sampling-based cardinality estimation by machine learning has been overlooked for a long time. In this paper, we propose a new sampling-based cardinality estimation method called LDSS by developing a learning-based dynamic sample selection method. Unlike the existing sampling-based methods that perform online sampling for every query, our method selects for the query the most suitable sample from the set of samples of various sizes that have been materialized during preprocessing. The cardinality of the query is then estimated based on the selected sample. Since our method is based on sampling, it can handle both single-table queries and join queries. Due to dynamic sample selection, costly online sampling is completely avoided. By learning the complex relationships between samples and queries, our learned sample selector can recommend small yet good samples for input queries. The extensive evaluation performed on the benchmarks indicates that LDSS can be trained much faster and can achieve higher accuracy than the state-of-the-art query-dependent methods and comparable accuracy to the current data-driven methods.
•We propose the first cardinality estimation framework using dynamic sample selection.•We propose LDSS, a learning-based approach to dynamic sample selection.•We develop a method to incrementally update the learned sample selection model.•We experimentally evaluated the performance of LDSS using a set of benchmarks.•LDSS can achieve better accuracy than the state-of-the-art query-dependent methods.</abstract><pub>Elsevier Ltd</pub><doi>10.1016/j.is.2023.102252</doi><orcidid>https://orcid.org/0000-0001-9475-8944</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0306-4379 |
ispartof | Information systems (Oxford), 2023-07, Vol.117, p.102252, Article 102252 |
issn | 0306-4379 1873-6076 |
language | eng |
recordid | cdi_crossref_primary_10_1016_j_is_2023_102252 |
source | ScienceDirect Freedom Collection 2022-2024 |
subjects | Cardinality estimation Dynamic sample selection Neural network |
title | Cardinality estimation via learned dynamic sample selection |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T08%3A14%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-elsevier_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Cardinality%20estimation%20via%20learned%20dynamic%20sample%20selection&rft.jtitle=Information%20systems%20(Oxford)&rft.au=Wang,%20Run-An&rft.date=2023-07&rft.volume=117&rft.spage=102252&rft.pages=102252-&rft.artnum=102252&rft.issn=0306-4379&rft.eissn=1873-6076&rft_id=info:doi/10.1016/j.is.2023.102252&rft_dat=%3Celsevier_cross%3ES0306437923000881%3C/elsevier_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c247t-68f3ebe95ead74459898dd32e90cee453f38a5591edf92b8f6076b49b7048b6c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |