Loading…
Private Blocking Technique for Multi-party Privacy-Preserving Record Linkage
The process of matching and integrating records that relate to the same entity from one or more datasets is known as record linkage, and it has become an increasingly important subject in many application areas, including business, government and health system. The data from these areas often contai...
Saved in:
Published in: | Data science and engineering 2017-06, Vol.2 (2), p.187-196 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The process of matching and integrating records that relate to the same entity from one or more datasets is known as record linkage, and it has become an increasingly important subject in many application areas, including business, government and health system. The data from these areas often contain sensitive information. To prevent privacy breaches, ideally records should be linked in a private way such that no information other than the matching result is leaked in the process, and this technique is called privacy-preserving record linkage (PPRL). With the increasing data, scalability becomes the main challenge of PPRL, and many private blocking techniques have been developed for PPRL. They are aimed at reducing the number of record pairs to be compared in the matching process by removing obvious non-matching pairs without compromising privacy. However, most of them are designed for two databases and they vary widely in their ability to balance competing goals of accuracy, efficiency and security. In this paper, we propose a novel private blocking approach for PPRL based on dynamic
k
-anonymous blocking and Paillier cryptosystem which can be applied on two or multiple databases. In dynamic
k
-anonymous blocking, our approach dynamically generates blocks satisfying
k
-anonymity and more accurate values to represent the blocks with varying
k
. We also propose a novel similarity measure method which performs on the numerical attributes and combines with Paillier cryptosystem to measure the similarity of two or more blocks in security, which provides strong privacy guarantees that none information reveals even collusion. Experiments conducted on a public dataset of voter registration records validate that our approach is scalable to large databases and keeps a high quality of blocking. We compare our method with other techniques and demonstrate the increases in security and accuracy. |
---|---|
ISSN: | 2364-1185 2364-1541 |
DOI: | 10.1007/s41019-017-0041-5 |