Loading…

Resolution-Aware Query Answering for Business Intelligence

Entity uncertainty is an unavoidable problem in modern enterprise databases, resulting from integration of data over multiple sources. In traditional warehousing, the administrator, during an ETL process, manually and laboriously resolves inconsistent data records to discover "true''...

Full description

Saved in:
Bibliographic Details
Main Authors: Sismanis, Y., Ling Wang, Fuxman, A., Haas, P.J., Reinwald, B.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Entity uncertainty is an unavoidable problem in modern enterprise databases, resulting from integration of data over multiple sources. In traditional warehousing, the administrator, during an ETL process, manually and laboriously resolves inconsistent data records to discover "true'' entities(customers, products, etc.) and identify their "correct'' attribute values. At any time point, however, the current entity resolution is merely a best guess, and OLAP query results based on this resolution are inherently imprecise. We propose a new approach that maintains the data in an unresolved state, and dynamically deals with entity uncertainty at query time. We enhance the traditional OLAP model to return not a single query answer, but rather upper and lower bounds on each OLAP aggregate. This approach avoids expensive entity-resolution processing, and serves to identify potential risks when making business decisions based on the results of OLAP queries. By focusing on bounds, rather than probability distributions, we can easily and efficiently process roll-up and group-by aggregation queries over all of the core aggregation functions. Moreover, our approach can be readily implemented in an existing RDBMS using SQL queries, and does not require the user to specify explicit probabilities for alternative entity resolutions. Experiments show that the overhead of our new OLAP functionality is small over a wide range of scenarios.
ISSN:1063-6382
2375-026X
DOI:10.1109/ICDE.2009.81