Loading…

Performance evaluation on data reconciliation algorithm in distributed system

This research came from a school-enterprise cooperation program, which aims to improve data reconciliation efficiency between two large-scale data sources. This paper mainly presents three typical algorithms: standard Bloom filter (BF), counting Bloom filter (CBF) and Invertible Bloom filter (IBF)....

Full description

Saved in:
Bibliographic Details
Main Authors: Xin Wang, Hongming Zhu, Qin Liu, Xiaowen Yang, Jiakai Xiao
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This research came from a school-enterprise cooperation program, which aims to improve data reconciliation efficiency between two large-scale data sources. This paper mainly presents three typical algorithms: standard Bloom filter (BF), counting Bloom filter (CBF) and Invertible Bloom filter (IBF). With the purpose of evaluating their performance, mainly on runtime and accuracy rate, a series of experiments were designed and applied to both a small-scale and a large-scale distributed system. These algorithms are compared based on one traditional query method Inner Join (IJ). And the result shows: under the MapReduce computing framework, Inner Join, followed by BF closely, has the best performance; large-scale distributed system can evidently improve the performance on dealing with large-scale data.
ISSN:2376-5933
2376-595X
DOI:10.1109/CCIS.2012.6664432