Loading…

Private Blocking Technique for Multi-party Privacy-Preserving Record Linkage

The process of matching and integrating records that relate to the same entity from one or more datasets is known as record linkage, and it has become an increasingly important subject in many application areas, including business, government and health system. The data from these areas often contai...

Full description

Saved in:
Bibliographic Details
Published in:Data science and engineering 2017-06, Vol.2 (2), p.187-196
Main Authors: Han, Shumin, Shen, Derong, Nie, Tiezheng, Kou, Yue, Yu, Ge
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The process of matching and integrating records that relate to the same entity from one or more datasets is known as record linkage, and it has become an increasingly important subject in many application areas, including business, government and health system. The data from these areas often contain sensitive information. To prevent privacy breaches, ideally records should be linked in a private way such that no information other than the matching result is leaked in the process, and this technique is called privacy-preserving record linkage (PPRL). With the increasing data, scalability becomes the main challenge of PPRL, and many private blocking techniques have been developed for PPRL. They are aimed at reducing the number of record pairs to be compared in the matching process by removing obvious non-matching pairs without compromising privacy. However, most of them are designed for two databases and they vary widely in their ability to balance competing goals of accuracy, efficiency and security. In this paper, we propose a novel private blocking approach for PPRL based on dynamic k -anonymous blocking and Paillier cryptosystem which can be applied on two or multiple databases. In dynamic k -anonymous blocking, our approach dynamically generates blocks satisfying k -anonymity and more accurate values to represent the blocks with varying k . We also propose a novel similarity measure method which performs on the numerical attributes and combines with Paillier cryptosystem to measure the similarity of two or more blocks in security, which provides strong privacy guarantees that none information reveals even collusion. Experiments conducted on a public dataset of voter registration records validate that our approach is scalable to large databases and keeps a high quality of blocking. We compare our method with other techniques and demonstrate the increases in security and accuracy.
ISSN:2364-1185
2364-1541
DOI:10.1007/s41019-017-0041-5