Loading…

Extending the data warehouse for service provisioning data

The last few years, there has been an extensive body of literature in data warehousing applications that primarily focuses on basket-type (transactional) data, common in retail industries. In this paper we focus on service provisioning data, that is data that is recorded internally in an organizatio...

Full description

Saved in:
Bibliographic Details
Published in:Data & knowledge engineering 2006-12, Vol.59 (3), p.700-724
Main Author: Kotidis, Yannis
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The last few years, there has been an extensive body of literature in data warehousing applications that primarily focuses on basket-type (transactional) data, common in retail industries. In this paper we focus on service provisioning data, that is data that is recorded internally in an organization for provisioning certain business related tasks. Coupling the recorded data with the underlying process and business-practice(s) that generate them is crucial for end-to-end analysis. Our framework is based on a graph description of the process (called a sketch) that is generating this data. Using this sketch, we formalize a new class of aggregate queries that consolidate data from a part of the process, based on a user defined path expression. We then show how to build a compact, non-redundant collection of summary (aggregate) tables and indices for this new type of queries. We first explore how to select a minimum set of views to answer queries with path-expressions over the given sketch. For queries that also include aggregation, we define two partial orders among the views. The first is used to pick the minimum set of aggregate views to answer any query with no false dismissals, while the second describes an augmented set that allows fewer false positives. Computing a non-materialized aggregate is done through appropriate rewriting of the user query. We describe two indexing schemes that use phantom (non-materialized) aggregate values to expedite query processing. Experimental results show these schemes to perform well on synthetic and real datasets.
ISSN:0169-023X
1872-6933
DOI:10.1016/j.datak.2005.11.003