Loading…

Efficient k-anonymous microaggregation of multivariate numerical data via principal component analysis

•The primary goal of this work is to reduce the running time of k-anonymous microaggregation algo-rithms operating on datasets with a large quantity of numerical demographic attributes, acting as quasi-identifiers. Principal component analysis (PCA), an algebraic-statistical procedure that construct...

Full description

Saved in:
Bibliographic Details
Published in:Information sciences 2019-11, Vol.503, p.417-443
Main Authors: Monedero, David Rebollo, Mezher, Ahmad Mohamad, Colomé, Xavier Casanova, Forné, Jordi, Soriano, Miguel
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•The primary goal of this work is to reduce the running time of k-anonymous microaggregation algo-rithms operating on datasets with a large quantity of numerical demographic attributes, acting as quasi-identifiers. Principal component analysis (PCA), an algebraic-statistical procedure that constructs an or-thogonal projection onto a lower-dimensional subspace, permits the effective reduction of the number of attributes of the original dataset. The optimality principles of multivariate PCA strive to preserve Euclidean distances between the projected data points.•The compressed data is fed to the microaggregation algorithm, but the k-anonymous microcells or groups obtained are directly applied to the original data. The distance-preservation properties of multivariate PCA help construct a micropartition of the set of respondents similar to that obtained when the original data is microaggregated in the conventional fashion, but in fewer dimensions.•This means that we are able to achieve significant time gains ( ≈  14–31%) with very little impact on information utility ( 
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2019.07.042