Loading…
Mathematically optimized, recursive prepartitioning strategies for k-anonymous microaggregation of large-scale datasets
•We reduce the running time of k-anonymous microaggregation of large-scale datasets.•A novel, mathematically optimized strategy for prepartitioning the dataset.•Our approach owes to the superadditive running time of microaggregation.•Running time and information loss are assessed over multiple datas...
Saved in:
Published in: | Expert systems with applications 2020-04, Vol.144, p.113086, Article 113086 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •We reduce the running time of k-anonymous microaggregation of large-scale datasets.•A novel, mathematically optimized strategy for prepartitioning the dataset.•Our approach owes to the superadditive running time of microaggregation.•Running time and information loss are assessed over multiple datasets.
The technical contents of this work fall within the statistical disclosure control (SDC) field, which concerns the postprocessing of the demographic portion of the statistical results of surveys containing sensitive personal information, in order to effectively safeguard the anonymity of the participating respondents. A widely known technique to solve the problem of protecting the privacy of the respondents involved beyond the mere suppression of their identifiers is the k-anonymous microaggregation. Unfortunately, most microaggregation algorithms that produce competitively low levels of distortions exhibit a superlinear running time, typically scaling with the square of the number of records in the dataset.
This work proposes and analyzes an optimized prepartitioning strategy to reduce significantly the running time for the k-anonymous microaggregation algorithm operating on large datasets, with mild loss in data utility with respect to that of MDAV, the underlying method. The optimization strategy is based on prepartitioning a dataset recursively until the desired k-anonymity parameter is achieved. Traditional microaggregation algorithms have quadratic computational complexity in the form Θ(n2). By using the proposed method and fixing the number of recurrent prepartitions we obtain subquadratic complexity in the form Θ(n3/2), Θ(n4/3), ..., depending on the number of prepartitions. Alternatively, fixing the ratio between the size of the microcell and the macrocell on each prepartition, quasilinear complexity in the form Θ(nlog n) is achieved. Our method is readily applicable to large-scale datasets with numerical demographic attributes. |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2019.113086 |