Loading…
Efficient clustering of uncertain data streams
Clustering uncertain data streams has recently become one of the most challenging tasks in data management because of the strict space and time requirements of processing tuples arriving at high speed and the difficulty that arises from handling uncertain data. The prior work on clustering data stre...
Saved in:
Published in: | Knowledge and information systems 2014-09, Vol.40 (3), p.509-539 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Clustering uncertain data streams has recently become one of the most challenging tasks in data management because of the strict space and time requirements of processing tuples arriving at high speed and the difficulty that arises from handling uncertain data. The prior work on clustering data streams focuses on devising complicated synopsis data structures to summarize data streams into a small number of micro-clusters so that important statistics can be computed conveniently, such as Clustering Feature (
C
F) (Zhang et al. in Proceedings of ACM SIGMOD, pp 103–114,
1996
) for deterministic data and Error-based Clustering Feature (
E
CF) (Aggarwal and Yu in Proceedings of ICDE,
2008
) for uncertain data. However,
E
CF can only handle
attribute-level uncertainty
, while
existential uncertainty
, the other kind of uncertainty, has not been addressed yet. In this paper, we propose a novel data structure, Uncertain Feature (
U
F), to summarize data streams with both kinds of uncertainties:
U
F is space-efficient, has
additive
and
subtractive
properties, and can compute complicated statistics easily. Our first attempt aims at enhancing the previous streaming approaches to handle the sliding-window model by using
U
F instead of old synopses, inclusive of
C
luStream (Aggarwal et al. in Proceedings of VLDB,
2003
) and
U
Micro (Aggarwal and Yu in Proceedings of ICDE,
2008
). We show that such methods cannot achieve high efficiency. Our second attempt aims at devising a novel algorithm,
c
luUS , to handle the sliding-window model by using
U
F structure. Detailed analysis and thorough experimental reports on synthetic and real data sets confirm the advantages of our proposed method. |
---|---|
ISSN: | 0219-1377 0219-3116 |
DOI: | 10.1007/s10115-013-0657-3 |