Loading…

Cleaning Uncertain Data with Crowdsourcing - a General Model with Diverse Accuracy Rates

Uncertain data has been emerged as an important problem in database systems due to the imprecise nature of many applications. To handle the uncertainty, probabilistic databases can be used to store uncertain data, and querying facilities are provided to yield answers with confidence. However, the un...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on knowledge and data engineering 2022-08, Vol.34 (8), p.1-1
Main Authors: Zhang, Chen, Zhang, Haodi, Xie, Weiteng, Liu, Nan, Li, Qifan, Wu, Kaishun, Jiang, Di, Lin, Peiguang, Chen, Lei
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Uncertain data has been emerged as an important problem in database systems due to the imprecise nature of many applications. To handle the uncertainty, probabilistic databases can be used to store uncertain data, and querying facilities are provided to yield answers with confidence. However, the uncertainty may propagate, hence the results from a query or mining process may not be useful. In this paper, we leverage the power of crowdsourcing by designing a set of Human Intelligence Tasks (HITs) to ask a crowd with diverse accuracy rates, to improve the quality of uncertain data. Each HIT is associated with a cost, thus, we need to design solutions to maximize the data quality with minimal number of HITs. There are two obstacles for this non-trivial optimization, which lead to very high computational cost for selecting the optimal set of HITs. First, members of a crowd may return incorrect answers with different probabilities. Second, the HITs decomposed from uncertain data are often correlated. We have addressed these challenges in this paper by designing an effective approximation algorithm and an efficient heuristic solution, even under diverse individual accuracy rates of the crowdsourcing workers.
ISSN:1041-4347
1558-2191
DOI:10.1109/TKDE.2020.3027545