Loading…
Efficient k-anonymous microaggregation of multivariate numerical data via principal component analysis
•The primary goal of this work is to reduce the running time of k-anonymous microaggregation algo-rithms operating on datasets with a large quantity of numerical demographic attributes, acting as quasi-identifiers. Principal component analysis (PCA), an algebraic-statistical procedure that construct...
Saved in:
Published in: | Information sciences 2019-11, Vol.503, p.417-443 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •The primary goal of this work is to reduce the running time of k-anonymous microaggregation algo-rithms operating on datasets with a large quantity of numerical demographic attributes, acting as quasi-identifiers. Principal component analysis (PCA), an algebraic-statistical procedure that constructs an or-thogonal projection onto a lower-dimensional subspace, permits the effective reduction of the number of attributes of the original dataset. The optimality principles of multivariate PCA strive to preserve Euclidean distances between the projected data points.•The compressed data is fed to the microaggregation algorithm, but the k-anonymous microcells or groups obtained are directly applied to the original data. The distance-preservation properties of multivariate PCA help construct a micropartition of the set of respondents similar to that obtained when the original data is microaggregated in the conventional fashion, but in fewer dimensions.•This means that we are able to achieve significant time gains ( ≈ 14–31%) with very little impact on information utility ( |
---|---|
ISSN: | 0020-0255 1872-6291 |
DOI: | 10.1016/j.ins.2019.07.042 |