Loading…
Local Differentially Private Fuzzy Counting in Stream Data Using Probabilistic Data Structures
Privacy-preserving estimation of counts of items in streaming data finds applications in several real-world scenarios including word auto-correction and traffic management applications. Recent works of RAPPOR Erlingsson et al. (2014) and Apple's count-mean sketch (CMS) algorithm D. P. T. Apple,...
Saved in:
Published in: | IEEE transactions on knowledge and data engineering 2023-08, Vol.35 (8), p.8185-8198 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c293t-377e7883766414f1d701228f31d41c645a6fda657e503faff135a1bd3d3fd6fb3 |
---|---|
cites | cdi_FETCH-LOGICAL-c293t-377e7883766414f1d701228f31d41c645a6fda657e503faff135a1bd3d3fd6fb3 |
container_end_page | 8198 |
container_issue | 8 |
container_start_page | 8185 |
container_title | IEEE transactions on knowledge and data engineering |
container_volume | 35 |
creator | Vatsalan, Dinusha Bhaskar, Raghav Kaafar, Mohamed Ali |
description | Privacy-preserving estimation of counts of items in streaming data finds applications in several real-world scenarios including word auto-correction and traffic management applications. Recent works of RAPPOR Erlingsson et al. (2014) and Apple's count-mean sketch (CMS) algorithm D. P. T. Apple, (2017) propose privacy preserving mechanisms for count estimation in large volumes of data using probabilistic data structures like counting Bloom filter and CMS. However, these existing methods fall short in providing a sound solution for real-time streaming data applications. Since the size of the data structure in these methods is not adaptive to the volume of the streaming data, the utility (accuracy of the count estimate) can suffer over time due to increased false positive rates. Further, the lookup operation needs to be highly efficient to answer count estimate queries in real-time. More importantly, the local Differential privacy mechanisms used in these approaches to provide privacy guarantees come at a large cost to utility (impacting the accuracy of count estimation). In this work, we propose a novel (local) Differentially private mechanism that provides high utility for the streaming data count estimation problem with similar or even lower privacy budgets while providing: a) fuzzy counting to report counts of related or similar items (for instance to account for typing errors and data variations), and b) improved querying efficiency to reduce the response time for real-time querying of counts. Our algorithm uses a combination of two probabilistic data structures Cuckoo filter and Bloom filter. We provide formal proofs for privacy and utility guarantees and present extensive experimental evaluation of our algorithm using real and synthetic English words datasets for both the exact and fuzzy counting scenarios. Our privacy preserving mechanism substantially outperforms the prior work in terms of lower querying time, significantly higher utility (accuracy of count estimation) under similar or lower privacy guarantees, at the cost of communication overhead. |
doi_str_mv | 10.1109/TKDE.2022.3198478 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2834306314</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9855874</ieee_id><sourcerecordid>2834306314</sourcerecordid><originalsourceid>FETCH-LOGICAL-c293t-377e7883766414f1d701228f31d41c645a6fda657e503faff135a1bd3d3fd6fb3</originalsourceid><addsrcrecordid>eNo9kN9LwzAQx4MoOKd_gPgS8Lkzl6RN-ij7oeLAgdurIW0TyejWmaTC9tfb0uHTHff93B18ELoHMgEg-dP6fTafUELphEEuuZAXaARpKhMKOVx2PeGQcMbFNboJYUsIkULCCH0tm1LXeOasNd7so9N1fcQr7351NHjRnk5HPG3aLth_Y7fHn9EbvcMzHTXehH648k2hC1e7EF05BB3UlrH1JtyiK6vrYO7OdYw2i_l6-posP17eps_LpKQ5iwkTwggpmcgyDtxCJQhQKi2DikOZ8VRnttJZKkxKmNXWAks1FBWrmK0yW7AxehzuHnzz05oQ1bZp_b57qahknJGMAe8oGKjSNyF4Y9XBu532RwVE9RpVr1H1GtVZY7fzMOw4Y8w_n8vOreDsD-L2boE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2834306314</pqid></control><display><type>article</type><title>Local Differentially Private Fuzzy Counting in Stream Data Using Probabilistic Data Structures</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Vatsalan, Dinusha ; Bhaskar, Raghav ; Kaafar, Mohamed Ali</creator><creatorcontrib>Vatsalan, Dinusha ; Bhaskar, Raghav ; Kaafar, Mohamed Ali</creatorcontrib><description>Privacy-preserving estimation of counts of items in streaming data finds applications in several real-world scenarios including word auto-correction and traffic management applications. Recent works of RAPPOR Erlingsson et al. (2014) and Apple's count-mean sketch (CMS) algorithm D. P. T. Apple, (2017) propose privacy preserving mechanisms for count estimation in large volumes of data using probabilistic data structures like counting Bloom filter and CMS. However, these existing methods fall short in providing a sound solution for real-time streaming data applications. Since the size of the data structure in these methods is not adaptive to the volume of the streaming data, the utility (accuracy of the count estimate) can suffer over time due to increased false positive rates. Further, the lookup operation needs to be highly efficient to answer count estimate queries in real-time. More importantly, the local Differential privacy mechanisms used in these approaches to provide privacy guarantees come at a large cost to utility (impacting the accuracy of count estimation). In this work, we propose a novel (local) Differentially private mechanism that provides high utility for the streaming data count estimation problem with similar or even lower privacy budgets while providing: a) fuzzy counting to report counts of related or similar items (for instance to account for typing errors and data variations), and b) improved querying efficiency to reduce the response time for real-time querying of counts. Our algorithm uses a combination of two probabilistic data structures Cuckoo filter and Bloom filter. We provide formal proofs for privacy and utility guarantees and present extensive experimental evaluation of our algorithm using real and synthetic English words datasets for both the exact and fuzzy counting scenarios. Our privacy preserving mechanism substantially outperforms the prior work in terms of lower querying time, significantly higher utility (accuracy of count estimation) under similar or lower privacy guarantees, at the cost of communication overhead.</description><identifier>ISSN: 1041-4347</identifier><identifier>EISSN: 1558-2191</identifier><identifier>DOI: 10.1109/TKDE.2022.3198478</identifier><identifier>CODEN: ITKEEH</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Accuracy ; Algorithms ; Bloom filter ; Counting ; Cuckoo filter ; data streams ; Data structures ; Error correction ; Estimation ; Filtering algorithms ; fuzzy counting ; Fuzzy sets ; Local differential privacy ; Matched filters ; Privacy ; Probabilistic logic ; Real time ; real-time querying ; Real-time systems ; Response time (computers)</subject><ispartof>IEEE transactions on knowledge and data engineering, 2023-08, Vol.35 (8), p.8185-8198</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c293t-377e7883766414f1d701228f31d41c645a6fda657e503faff135a1bd3d3fd6fb3</citedby><cites>FETCH-LOGICAL-c293t-377e7883766414f1d701228f31d41c645a6fda657e503faff135a1bd3d3fd6fb3</cites><orcidid>0000-0001-6713-7667 ; 0000-0003-2714-0276</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9855874$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,27903,27904,54774</link.rule.ids></links><search><creatorcontrib>Vatsalan, Dinusha</creatorcontrib><creatorcontrib>Bhaskar, Raghav</creatorcontrib><creatorcontrib>Kaafar, Mohamed Ali</creatorcontrib><title>Local Differentially Private Fuzzy Counting in Stream Data Using Probabilistic Data Structures</title><title>IEEE transactions on knowledge and data engineering</title><addtitle>TKDE</addtitle><description>Privacy-preserving estimation of counts of items in streaming data finds applications in several real-world scenarios including word auto-correction and traffic management applications. Recent works of RAPPOR Erlingsson et al. (2014) and Apple's count-mean sketch (CMS) algorithm D. P. T. Apple, (2017) propose privacy preserving mechanisms for count estimation in large volumes of data using probabilistic data structures like counting Bloom filter and CMS. However, these existing methods fall short in providing a sound solution for real-time streaming data applications. Since the size of the data structure in these methods is not adaptive to the volume of the streaming data, the utility (accuracy of the count estimate) can suffer over time due to increased false positive rates. Further, the lookup operation needs to be highly efficient to answer count estimate queries in real-time. More importantly, the local Differential privacy mechanisms used in these approaches to provide privacy guarantees come at a large cost to utility (impacting the accuracy of count estimation). In this work, we propose a novel (local) Differentially private mechanism that provides high utility for the streaming data count estimation problem with similar or even lower privacy budgets while providing: a) fuzzy counting to report counts of related or similar items (for instance to account for typing errors and data variations), and b) improved querying efficiency to reduce the response time for real-time querying of counts. Our algorithm uses a combination of two probabilistic data structures Cuckoo filter and Bloom filter. We provide formal proofs for privacy and utility guarantees and present extensive experimental evaluation of our algorithm using real and synthetic English words datasets for both the exact and fuzzy counting scenarios. Our privacy preserving mechanism substantially outperforms the prior work in terms of lower querying time, significantly higher utility (accuracy of count estimation) under similar or lower privacy guarantees, at the cost of communication overhead.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Bloom filter</subject><subject>Counting</subject><subject>Cuckoo filter</subject><subject>data streams</subject><subject>Data structures</subject><subject>Error correction</subject><subject>Estimation</subject><subject>Filtering algorithms</subject><subject>fuzzy counting</subject><subject>Fuzzy sets</subject><subject>Local differential privacy</subject><subject>Matched filters</subject><subject>Privacy</subject><subject>Probabilistic logic</subject><subject>Real time</subject><subject>real-time querying</subject><subject>Real-time systems</subject><subject>Response time (computers)</subject><issn>1041-4347</issn><issn>1558-2191</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNo9kN9LwzAQx4MoOKd_gPgS8Lkzl6RN-ij7oeLAgdurIW0TyejWmaTC9tfb0uHTHff93B18ELoHMgEg-dP6fTafUELphEEuuZAXaARpKhMKOVx2PeGQcMbFNboJYUsIkULCCH0tm1LXeOasNd7so9N1fcQr7351NHjRnk5HPG3aLth_Y7fHn9EbvcMzHTXehH648k2hC1e7EF05BB3UlrH1JtyiK6vrYO7OdYw2i_l6-posP17eps_LpKQ5iwkTwggpmcgyDtxCJQhQKi2DikOZ8VRnttJZKkxKmNXWAks1FBWrmK0yW7AxehzuHnzz05oQ1bZp_b57qahknJGMAe8oGKjSNyF4Y9XBu532RwVE9RpVr1H1GtVZY7fzMOw4Y8w_n8vOreDsD-L2boE</recordid><startdate>20230801</startdate><enddate>20230801</enddate><creator>Vatsalan, Dinusha</creator><creator>Bhaskar, Raghav</creator><creator>Kaafar, Mohamed Ali</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-6713-7667</orcidid><orcidid>https://orcid.org/0000-0003-2714-0276</orcidid></search><sort><creationdate>20230801</creationdate><title>Local Differentially Private Fuzzy Counting in Stream Data Using Probabilistic Data Structures</title><author>Vatsalan, Dinusha ; Bhaskar, Raghav ; Kaafar, Mohamed Ali</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c293t-377e7883766414f1d701228f31d41c645a6fda657e503faff135a1bd3d3fd6fb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Bloom filter</topic><topic>Counting</topic><topic>Cuckoo filter</topic><topic>data streams</topic><topic>Data structures</topic><topic>Error correction</topic><topic>Estimation</topic><topic>Filtering algorithms</topic><topic>fuzzy counting</topic><topic>Fuzzy sets</topic><topic>Local differential privacy</topic><topic>Matched filters</topic><topic>Privacy</topic><topic>Probabilistic logic</topic><topic>Real time</topic><topic>real-time querying</topic><topic>Real-time systems</topic><topic>Response time (computers)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Vatsalan, Dinusha</creatorcontrib><creatorcontrib>Bhaskar, Raghav</creatorcontrib><creatorcontrib>Kaafar, Mohamed Ali</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on knowledge and data engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Vatsalan, Dinusha</au><au>Bhaskar, Raghav</au><au>Kaafar, Mohamed Ali</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Local Differentially Private Fuzzy Counting in Stream Data Using Probabilistic Data Structures</atitle><jtitle>IEEE transactions on knowledge and data engineering</jtitle><stitle>TKDE</stitle><date>2023-08-01</date><risdate>2023</risdate><volume>35</volume><issue>8</issue><spage>8185</spage><epage>8198</epage><pages>8185-8198</pages><issn>1041-4347</issn><eissn>1558-2191</eissn><coden>ITKEEH</coden><abstract>Privacy-preserving estimation of counts of items in streaming data finds applications in several real-world scenarios including word auto-correction and traffic management applications. Recent works of RAPPOR Erlingsson et al. (2014) and Apple's count-mean sketch (CMS) algorithm D. P. T. Apple, (2017) propose privacy preserving mechanisms for count estimation in large volumes of data using probabilistic data structures like counting Bloom filter and CMS. However, these existing methods fall short in providing a sound solution for real-time streaming data applications. Since the size of the data structure in these methods is not adaptive to the volume of the streaming data, the utility (accuracy of the count estimate) can suffer over time due to increased false positive rates. Further, the lookup operation needs to be highly efficient to answer count estimate queries in real-time. More importantly, the local Differential privacy mechanisms used in these approaches to provide privacy guarantees come at a large cost to utility (impacting the accuracy of count estimation). In this work, we propose a novel (local) Differentially private mechanism that provides high utility for the streaming data count estimation problem with similar or even lower privacy budgets while providing: a) fuzzy counting to report counts of related or similar items (for instance to account for typing errors and data variations), and b) improved querying efficiency to reduce the response time for real-time querying of counts. Our algorithm uses a combination of two probabilistic data structures Cuckoo filter and Bloom filter. We provide formal proofs for privacy and utility guarantees and present extensive experimental evaluation of our algorithm using real and synthetic English words datasets for both the exact and fuzzy counting scenarios. Our privacy preserving mechanism substantially outperforms the prior work in terms of lower querying time, significantly higher utility (accuracy of count estimation) under similar or lower privacy guarantees, at the cost of communication overhead.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TKDE.2022.3198478</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-6713-7667</orcidid><orcidid>https://orcid.org/0000-0003-2714-0276</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1041-4347 |
ispartof | IEEE transactions on knowledge and data engineering, 2023-08, Vol.35 (8), p.8185-8198 |
issn | 1041-4347 1558-2191 |
language | eng |
recordid | cdi_proquest_journals_2834306314 |
source | IEEE Electronic Library (IEL) Journals |
subjects | Accuracy Algorithms Bloom filter Counting Cuckoo filter data streams Data structures Error correction Estimation Filtering algorithms fuzzy counting Fuzzy sets Local differential privacy Matched filters Privacy Probabilistic logic Real time real-time querying Real-time systems Response time (computers) |
title | Local Differentially Private Fuzzy Counting in Stream Data Using Probabilistic Data Structures |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T00%3A42%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Local%20Differentially%20Private%20Fuzzy%20Counting%20in%20Stream%20Data%20Using%20Probabilistic%20Data%20Structures&rft.jtitle=IEEE%20transactions%20on%20knowledge%20and%20data%20engineering&rft.au=Vatsalan,%20Dinusha&rft.date=2023-08-01&rft.volume=35&rft.issue=8&rft.spage=8185&rft.epage=8198&rft.pages=8185-8198&rft.issn=1041-4347&rft.eissn=1558-2191&rft.coden=ITKEEH&rft_id=info:doi/10.1109/TKDE.2022.3198478&rft_dat=%3Cproquest_cross%3E2834306314%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c293t-377e7883766414f1d701228f31d41c645a6fda657e503faff135a1bd3d3fd6fb3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2834306314&rft_id=info:pmid/&rft_ieee_id=9855874&rfr_iscdi=true |