Loading…

Collecting Preference Rankings Under Local Differential Privacy

With the deep penetration of the Internet and mobile devices, preference rankings are being collected on a massive scale by diverse data collectors for various business demands. However, users' preference rankings in many applications are highly sensitive. Without proper privacy protection mech...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on knowledge and data engineering 2023-07, Vol.35 (7), p.6752-6766
Main Authors:	Cheng, Xiang, Yang, Jianyu, Wang, Yufei, Chen, Rui, Su, Sen, Li, Yuejia
Format:	Article
Language:	English
Subjects:	Business data collection Data models Datasets Differential privacy Domains Estimation Frequency estimation local differential privacy Machine learning Preference rankings Privacy riffle independent model Transforms
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	With the deep penetration of the Internet and mobile devices, preference rankings are being collected on a massive scale by diverse data collectors for various business demands. However, users' preference rankings in many applications are highly sensitive. Without proper privacy protection mechanisms, it either puts individual privacy in jeopardy or hampers business opportunities due to users' unwillingness to share their true rankings. In this paper, we initiate the study of collecting preference rankings under local differential privacy. The key technical challenge comes from the fact that the number of possible rankings could be large in practical settings, leading to excessive injected noise. To solve this problem, we present a novel approach SAFARI, whose main idea is to collect a set of distributions over small domains which are carefully chosen based on the riffle independent (RI) model to approximate the overall distribution of users' rankings, and then generate a synthetic ranking dataset from the obtained distributions. By working on small domains instead of a large domain, SAFARI can significantly reduce the magnitude of added noise. In SAFARI, we design two transformation rules, namely Rule I and Rule II, to instruct users to transform their data to provide the information about the distributions of the small domains. In particular, we propose a method called LADE to precisely estimate the required distributions used for the structure learning of RI model. We also propose a new LDP method called SAFA for frequency estimation over multiple attributes that have small domains. We formally prove that SAFARI guarantees \varepsilon ɛ -local differential privacy. Extensive experiments on real datasets confirm the effectiveness of SAFARI.
ISSN:	1041-4347 1558-2191
DOI:	10.1109/TKDE.2022.3186907