Loading…
Performance evaluation on data reconciliation algorithm in distributed system
This research came from a school-enterprise cooperation program, which aims to improve data reconciliation efficiency between two large-scale data sources. This paper mainly presents three typical algorithms: standard Bloom filter (BF), counting Bloom filter (CBF) and Invertible Bloom filter (IBF)....
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This research came from a school-enterprise cooperation program, which aims to improve data reconciliation efficiency between two large-scale data sources. This paper mainly presents three typical algorithms: standard Bloom filter (BF), counting Bloom filter (CBF) and Invertible Bloom filter (IBF). With the purpose of evaluating their performance, mainly on runtime and accuracy rate, a series of experiments were designed and applied to both a small-scale and a large-scale distributed system. These algorithms are compared based on one traditional query method Inner Join (IJ). And the result shows: under the MapReduce computing framework, Inner Join, followed by BF closely, has the best performance; large-scale distributed system can evidently improve the performance on dealing with large-scale data. |
---|---|
ISSN: | 2376-5933 2376-595X |
DOI: | 10.1109/CCIS.2012.6664432 |