Loading…
Modeling uncertain data using Monte Carlo integration method for clustering
•Uncertain data is expressed as pds using a modified Monte-Carlo Integration method.•Three distance metrics have been presented to find the similarity between two pds.•These metrics have been derived from the notion of KL & Jeffreys divergences.•Finally, k-medoid & DBSCAN clustering algorith...
Saved in:
Published in: | Expert systems with applications 2019-12, Vol.137, p.100-116 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •Uncertain data is expressed as pds using a modified Monte-Carlo Integration method.•Three distance metrics have been presented to find the similarity between two pds.•These metrics have been derived from the notion of KL & Jeffreys divergences.•Finally, k-medoid & DBSCAN clustering algorithm has been modified for clustering.
Nowadays, data clustering is an important task to the mining research community since the availability of uncertain data is increasing rapidly in many applications such as weather forecasting, business information management systems. In this work, proposed Monte Carlo integration based uncertain objects modeling technique is compared with three state-of-the-art methods namely, kernel density estimation, Dempster–Shafer, and Monte Carlo simulation. Then Kullback–Leibler and Jeffrey divergences are used to measure the similarity between uncertain objects and merge them with modified DBSCAN and k-medoids clustering algorithms. A heuristic algorithm is proposed to find the optimum radius, which is one of the inputs of DBSCAN. All the experiments are performed on one synthesized dataset and three real datasets namely, weather data, Japanese vowels and activity of daily living data. Five performance measures namely, accuracy, precision, recall, F-score, and Jaccard index are considered for comparing proposed method with state-of-the-art methods. Two non-parametric tests namely, Wilcoxon rank sum and sign test are also conducted. These results denote the effectiveness and efficiency of the proposed method over state-of-the-art methods. |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2019.06.050 |