Loading…

SALSA: Self-Adjusting Lean Streaming Analytics

Counters are the fundamental building block of many data sketching schemes, which hash items to a small number of counters and account for collisions to provide good approximations for frequencies and other measures. Most existing methods rely on fixed-size counters, which may be wasteful in terms o...

Full description

Saved in:
Bibliographic Details
Main Authors: Basat, Ran Ben, Einziger, Gil, Mitzenmacher, Michael, Vargaftik, Shay
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 875
container_issue
container_start_page 864
container_title
container_volume
creator Basat, Ran Ben
Einziger, Gil
Mitzenmacher, Michael
Vargaftik, Shay
description Counters are the fundamental building block of many data sketching schemes, which hash items to a small number of counters and account for collisions to provide good approximations for frequencies and other measures. Most existing methods rely on fixed-size counters, which may be wasteful in terms of space, as counters must be large enough to eliminate any risk of overflow. Instead, some solutions use small, fixed-size counters that may overflow into secondary structures.This paper takes a different approach. We propose a simple and general method called SALSA for dynamic re-sizing of counters, and show its effectiveness. SALSA starts with small counters, and overflowing counters simply merge with their neighbors. SALSA can thereby allow more counters for a given space, expanding them as necessary to represent large numbers. Our evaluation demonstrates that, at the cost of a small overhead for its merging logic, SALSA significantly improves the accuracy of popular schemes (such as Count-Min Sketch and Count Sketch) over a variety of tasks. Our code is released as open source [1].
doi_str_mv 10.1109/ICDE51399.2021.00080
format conference_proceeding
fullrecord <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_9458734</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9458734</ieee_id><sourcerecordid>9458734</sourcerecordid><originalsourceid>FETCH-LOGICAL-i249t-2ec517ee4550ffe74bd56f5de4197b8ae077b406585b00b1b769c8855ab072a3</originalsourceid><addsrcrecordid>eNotjMFKw0AQQFdBsNR8gR7yA4kzuzuZXW8h1loIeEgP3spuMpGUNEgTD_17FT09HjyeUg8IOSL4x131vCE03ucaNOYA4OBKJZ4dsnbo0VlzrVbaMGWgi_dblczz8ScDbxEJVipvyropn9JGxj4ru-PXvAzTR1pLmNJmOUs4_Wo5hfGyDO18p276MM6S_HOt9i-bffWa1W_bXVXW2aCtXzItLSGLWCLoe2EbOyp66sSi5-iCAHO0UJCjCBAxcuFb54hCBNbBrNX933YQkcPneTiF8-XgLTk21nwDO9RCpw</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>SALSA: Self-Adjusting Lean Streaming Analytics</title><source>IEEE Xplore All Conference Series</source><creator>Basat, Ran Ben ; Einziger, Gil ; Mitzenmacher, Michael ; Vargaftik, Shay</creator><creatorcontrib>Basat, Ran Ben ; Einziger, Gil ; Mitzenmacher, Michael ; Vargaftik, Shay</creatorcontrib><description>Counters are the fundamental building block of many data sketching schemes, which hash items to a small number of counters and account for collisions to provide good approximations for frequencies and other measures. Most existing methods rely on fixed-size counters, which may be wasteful in terms of space, as counters must be large enough to eliminate any risk of overflow. Instead, some solutions use small, fixed-size counters that may overflow into secondary structures.This paper takes a different approach. We propose a simple and general method called SALSA for dynamic re-sizing of counters, and show its effectiveness. SALSA starts with small counters, and overflowing counters simply merge with their neighbors. SALSA can thereby allow more counters for a given space, expanding them as necessary to represent large numbers. Our evaluation demonstrates that, at the cost of a small overhead for its merging logic, SALSA significantly improves the accuracy of popular schemes (such as Count-Min Sketch and Count Sketch) over a variety of tasks. Our code is released as open source [1].</description><identifier>EISSN: 2375-026X</identifier><identifier>EISBN: 9781728191843</identifier><identifier>EISBN: 172819184X</identifier><identifier>DOI: 10.1109/ICDE51399.2021.00080</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Conferences ; Data engineering ; Encoding ; Heuristic algorithms ; Layout ; Measurement errors ; Merging</subject><ispartof>2021 IEEE 37th International Conference on Data Engineering (ICDE), 2021, p.864-875</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9458734$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9458734$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Basat, Ran Ben</creatorcontrib><creatorcontrib>Einziger, Gil</creatorcontrib><creatorcontrib>Mitzenmacher, Michael</creatorcontrib><creatorcontrib>Vargaftik, Shay</creatorcontrib><title>SALSA: Self-Adjusting Lean Streaming Analytics</title><title>2021 IEEE 37th International Conference on Data Engineering (ICDE)</title><addtitle>ICDE</addtitle><description>Counters are the fundamental building block of many data sketching schemes, which hash items to a small number of counters and account for collisions to provide good approximations for frequencies and other measures. Most existing methods rely on fixed-size counters, which may be wasteful in terms of space, as counters must be large enough to eliminate any risk of overflow. Instead, some solutions use small, fixed-size counters that may overflow into secondary structures.This paper takes a different approach. We propose a simple and general method called SALSA for dynamic re-sizing of counters, and show its effectiveness. SALSA starts with small counters, and overflowing counters simply merge with their neighbors. SALSA can thereby allow more counters for a given space, expanding them as necessary to represent large numbers. Our evaluation demonstrates that, at the cost of a small overhead for its merging logic, SALSA significantly improves the accuracy of popular schemes (such as Count-Min Sketch and Count Sketch) over a variety of tasks. Our code is released as open source [1].</description><subject>Conferences</subject><subject>Data engineering</subject><subject>Encoding</subject><subject>Heuristic algorithms</subject><subject>Layout</subject><subject>Measurement errors</subject><subject>Merging</subject><issn>2375-026X</issn><isbn>9781728191843</isbn><isbn>172819184X</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2021</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotjMFKw0AQQFdBsNR8gR7yA4kzuzuZXW8h1loIeEgP3spuMpGUNEgTD_17FT09HjyeUg8IOSL4x131vCE03ucaNOYA4OBKJZ4dsnbo0VlzrVbaMGWgi_dblczz8ScDbxEJVipvyropn9JGxj4ru-PXvAzTR1pLmNJmOUs4_Wo5hfGyDO18p276MM6S_HOt9i-bffWa1W_bXVXW2aCtXzItLSGLWCLoe2EbOyp66sSi5-iCAHO0UJCjCBAxcuFb54hCBNbBrNX933YQkcPneTiF8-XgLTk21nwDO9RCpw</recordid><startdate>20210401</startdate><enddate>20210401</enddate><creator>Basat, Ran Ben</creator><creator>Einziger, Gil</creator><creator>Mitzenmacher, Michael</creator><creator>Vargaftik, Shay</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>20210401</creationdate><title>SALSA: Self-Adjusting Lean Streaming Analytics</title><author>Basat, Ran Ben ; Einziger, Gil ; Mitzenmacher, Michael ; Vargaftik, Shay</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i249t-2ec517ee4550ffe74bd56f5de4197b8ae077b406585b00b1b769c8855ab072a3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Conferences</topic><topic>Data engineering</topic><topic>Encoding</topic><topic>Heuristic algorithms</topic><topic>Layout</topic><topic>Measurement errors</topic><topic>Merging</topic><toplevel>online_resources</toplevel><creatorcontrib>Basat, Ran Ben</creatorcontrib><creatorcontrib>Einziger, Gil</creatorcontrib><creatorcontrib>Mitzenmacher, Michael</creatorcontrib><creatorcontrib>Vargaftik, Shay</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Basat, Ran Ben</au><au>Einziger, Gil</au><au>Mitzenmacher, Michael</au><au>Vargaftik, Shay</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>SALSA: Self-Adjusting Lean Streaming Analytics</atitle><btitle>2021 IEEE 37th International Conference on Data Engineering (ICDE)</btitle><stitle>ICDE</stitle><date>2021-04-01</date><risdate>2021</risdate><spage>864</spage><epage>875</epage><pages>864-875</pages><eissn>2375-026X</eissn><eisbn>9781728191843</eisbn><eisbn>172819184X</eisbn><coden>IEEPAD</coden><abstract>Counters are the fundamental building block of many data sketching schemes, which hash items to a small number of counters and account for collisions to provide good approximations for frequencies and other measures. Most existing methods rely on fixed-size counters, which may be wasteful in terms of space, as counters must be large enough to eliminate any risk of overflow. Instead, some solutions use small, fixed-size counters that may overflow into secondary structures.This paper takes a different approach. We propose a simple and general method called SALSA for dynamic re-sizing of counters, and show its effectiveness. SALSA starts with small counters, and overflowing counters simply merge with their neighbors. SALSA can thereby allow more counters for a given space, expanding them as necessary to represent large numbers. Our evaluation demonstrates that, at the cost of a small overhead for its merging logic, SALSA significantly improves the accuracy of popular schemes (such as Count-Min Sketch and Count Sketch) over a variety of tasks. Our code is released as open source [1].</abstract><pub>IEEE</pub><doi>10.1109/ICDE51399.2021.00080</doi><tpages>12</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier EISSN: 2375-026X
ispartof 2021 IEEE 37th International Conference on Data Engineering (ICDE), 2021, p.864-875
issn 2375-026X
language eng
recordid cdi_ieee_primary_9458734
source IEEE Xplore All Conference Series
subjects Conferences
Data engineering
Encoding
Heuristic algorithms
Layout
Measurement errors
Merging
title SALSA: Self-Adjusting Lean Streaming Analytics
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T23%3A10%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=SALSA:%20Self-Adjusting%20Lean%20Streaming%20Analytics&rft.btitle=2021%20IEEE%2037th%20International%20Conference%20on%20Data%20Engineering%20(ICDE)&rft.au=Basat,%20Ran%20Ben&rft.date=2021-04-01&rft.spage=864&rft.epage=875&rft.pages=864-875&rft.eissn=2375-026X&rft.coden=IEEPAD&rft_id=info:doi/10.1109/ICDE51399.2021.00080&rft.eisbn=9781728191843&rft.eisbn_list=172819184X&rft_dat=%3Cieee_CHZPO%3E9458734%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i249t-2ec517ee4550ffe74bd56f5de4197b8ae077b406585b00b1b769c8855ab072a3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=9458734&rfr_iscdi=true