Loading…

Privacy Preserving Clustering by Cluster Bulging for Information Sustenance

Cluster analysis is a data mining approach for unsupervised learning. However, the use of clustering as a data mining tool has been a cause of growing concern as the use of this technology is violating individual privacy. This paper presents a method for privacy preserving clustering through cluster...

Full description

Saved in:
Bibliographic Details
Main Authors: Kadampur, M.A., Somayajulu, D.V.L.N., Dhiraj, S.S.S., Satyam, S.G.P.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Cluster analysis is a data mining approach for unsupervised learning. However, the use of clustering as a data mining tool has been a cause of growing concern as the use of this technology is violating individual privacy. This paper presents a method for privacy preserving clustering through cluster bulging. In this method, the objects of the database are first aligned into clusters based on a similarity measure. The data in these clusters is perturbed in a controlled manner by modifying the values of various objects, so that, in the perturbed data set, the clusters are bulged in comparison to those in the original data set. In order to perform this perturbation, every cluster is displaced along the line joining its centroid to the centroid of the whole data set. And, then, every object in each cluster is shifted along the line joining that object to the centroid of the cluster. The word bulging used here refers to both positive and negative bulging. The method in essence manipulates the similarity measures and recomputes the new perturbed objects of the respective clusters. Thus, every object in the bulged cluster represents its corresponding object from the original cluster. After the application of this method, the objects get perturbed, while the number of member objects and shape of each cluster remain the same as those of the original clusters, thereby the information in the two instances of the data sets is sustained, while, the privacy of sensitive data is preserved.
ISSN:2151-1802
2151-1810
DOI:10.1109/ICIAFS.2008.4783947