Loading…
Possibilistic Data Cleaning
Classical data cleaning performs a minimal set of operations on the data to satisfy the given integrity constraints. Often, this minimization is equivalent to vertex cover, for example when tuples can be removed due to the violation of functional dependencies. Classically, the uncertainty of tuples...
Saved in:
Published in: | IEEE transactions on knowledge and data engineering 2022-12, Vol.34 (12), p.5939-5950 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Classical data cleaning performs a minimal set of operations on the data to satisfy the given integrity constraints. Often, this minimization is equivalent to vertex cover, for example when tuples can be removed due to the violation of functional dependencies. Classically, the uncertainty of tuples and constraints is ignored. We propose not to view data as dirty but the uncertainty information about data. Since probabilities are often unavailable and their treatment is limited due to correlations in the data, we investigate a qualitative approach to uncertainty. Tuples are assigned degrees of possibility with which they occur, and constraints are assigned degrees of certainty that say to which tuples they apply. Our approach is non-invasive to the data as we lower the possibility degree of tuples as little as possible. The new resulting qualitative version of vertex cover remains NP-hard. We establish an algorithm that is fixed-parameter tractable in the size of the qualitative vertex cover. Experiments with synthetic and real-world data show that our algorithm outperforms the classical algorithm proportionally to the available number of uncertainty degrees. By mining the certainty degrees with which constraints hold, our framework becomes applicable even when uncertainty information is unavailable. |
---|---|
ISSN: | 1041-4347 1558-2191 |
DOI: | 10.1109/TKDE.2021.3062318 |