Loading…
Effective Removal of Privacy Breaches in Disassociated Transactional Datasets
A broad range of web activities such as querying web pages, e-commerce transactions, health diagnosis and seat reservations generate vast volume of data, referred to as transactional data. These transactional data are published and widely used for data mining, research and analysis. However, the pub...
Saved in:
Published in: | Arabian journal for science and engineering (2011) 2020-04, Vol.45 (4), p.3257-3272 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | A broad range of web activities such as querying web pages, e-commerce transactions, health diagnosis and seat reservations generate vast volume of data, referred to as transactional data. These transactional data are published and widely used for data mining, research and analysis. However, the publishing of individuals’ transactional data implies serious concerns related to privacy for the individuals whose data have been published. The methods proposed in previous researches to preserve the privacy are suitable for structured relational data but are not well suitable to anonymize transactional data since the latter are generally unstructured, sparse and high dimensional. This paper addresses the problem of privacy-preserving publication of transactional data using two enhanced versions of ‘disassociation’ technique. Disassociation limits privacy breaches and increases the utility of the published data, but still, it does not eliminate them because it results in a cover problem that may lead to further privacy concerns. In this paper, we propose two algorithms: (i) improvement in disassociation using suppression and addition (IDSA) and (ii) improvement in disassociation by generalizing cover item (IDGC) to eliminate the cover problem of disassociated data. The proposed algorithms are implemented on INFORMS and BMS-Webview1 datasets and compared to disassociation concerning prevention of privacy breaches as well as loss in information. The results depict that the IDSA leads to a significant drop in privacy breaches due to cover problem with minimal information loss and IDGC completely removes the privacy breaches due to cover problem without any significant loss in data utility. |
---|---|
ISSN: | 2193-567X 1319-8025 2191-4281 |
DOI: | 10.1007/s13369-020-04353-5 |