Loading…

Modeling uncertain data using Monte Carlo integration method for clustering

•Uncertain data is expressed as pds using a modified Monte-Carlo Integration method.•Three distance metrics have been presented to find the similarity between two pds.•These metrics have been derived from the notion of KL & Jeffreys divergences.•Finally, k-medoid & DBSCAN clustering algorith...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications 2019-12, Vol.137, p.100-116
Main Authors: Sharma, Krishna Kumar, Seal, Ayan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c328t-741057d31cbdcf1a5195f5a70d836b5656053301329323cb6ec5ed1b2159e0f93
cites cdi_FETCH-LOGICAL-c328t-741057d31cbdcf1a5195f5a70d836b5656053301329323cb6ec5ed1b2159e0f93
container_end_page 116
container_issue
container_start_page 100
container_title Expert systems with applications
container_volume 137
creator Sharma, Krishna Kumar
Seal, Ayan
description •Uncertain data is expressed as pds using a modified Monte-Carlo Integration method.•Three distance metrics have been presented to find the similarity between two pds.•These metrics have been derived from the notion of KL & Jeffreys divergences.•Finally, k-medoid & DBSCAN clustering algorithm has been modified for clustering. Nowadays, data clustering is an important task to the mining research community since the availability of uncertain data is increasing rapidly in many applications such as weather forecasting, business information management systems. In this work, proposed Monte Carlo integration based uncertain objects modeling technique is compared with three state-of-the-art methods namely, kernel density estimation, Dempster–Shafer, and Monte Carlo simulation. Then Kullback–Leibler and Jeffrey divergences are used to measure the similarity between uncertain objects and merge them with modified DBSCAN and k-medoids clustering algorithms. A heuristic algorithm is proposed to find the optimum radius, which is one of the inputs of DBSCAN. All the experiments are performed on one synthesized dataset and three real datasets namely, weather data, Japanese vowels and activity of daily living data. Five performance measures namely, accuracy, precision, recall, F-score, and Jaccard index are considered for comparing proposed method with state-of-the-art methods. Two non-parametric tests namely, Wilcoxon rank sum and sign test are also conducted. These results denote the effectiveness and efficiency of the proposed method over state-of-the-art methods.
doi_str_mv 10.1016/j.eswa.2019.06.050
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2306475745</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0957417419304506</els_id><sourcerecordid>2306475745</sourcerecordid><originalsourceid>FETCH-LOGICAL-c328t-741057d31cbdcf1a5195f5a70d836b5656053301329323cb6ec5ed1b2159e0f93</originalsourceid><addsrcrecordid>eNp9kE1LxDAQhoMouK7-AU8Bz62Tpkm24EUWv3AXL3oOaZKuKd1Gk1Tx35uynj3NMLzPzPAgdEmgJED4dV_a-K3KCkhTAi-BwRFakJWgBRcNPUYLaJgoaiLqU3QWYw9ABIBYoOetN3Zw4w5Po7YhKTdio5LCU5yHWz8mi9cqDB673O6CSs6PeG_Tuze48wHrYYrJhpw-RyedGqK9-KtL9HZ_97p-LDYvD0_r202habVKhagJMGEo0a3RHVGMNKxjSoBZUd4yzjgwSoHQqqEV1S23mllD2oqwxkLX0CW6Ouz9CP5zsjHJ3k9hzCdlRYHXgoma5VR1SOngYwy2kx_B7VX4kQTkLE32cpYmZ2kSuMzSMnRzgGz-_8vZIKN2NpsxLlidpPHuP_wX-Tx08Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2306475745</pqid></control><display><type>article</type><title>Modeling uncertain data using Monte Carlo integration method for clustering</title><source>Elsevier</source><creator>Sharma, Krishna Kumar ; Seal, Ayan</creator><creatorcontrib>Sharma, Krishna Kumar ; Seal, Ayan</creatorcontrib><description>•Uncertain data is expressed as pds using a modified Monte-Carlo Integration method.•Three distance metrics have been presented to find the similarity between two pds.•These metrics have been derived from the notion of KL &amp; Jeffreys divergences.•Finally, k-medoid &amp; DBSCAN clustering algorithm has been modified for clustering. Nowadays, data clustering is an important task to the mining research community since the availability of uncertain data is increasing rapidly in many applications such as weather forecasting, business information management systems. In this work, proposed Monte Carlo integration based uncertain objects modeling technique is compared with three state-of-the-art methods namely, kernel density estimation, Dempster–Shafer, and Monte Carlo simulation. Then Kullback–Leibler and Jeffrey divergences are used to measure the similarity between uncertain objects and merge them with modified DBSCAN and k-medoids clustering algorithms. A heuristic algorithm is proposed to find the optimum radius, which is one of the inputs of DBSCAN. All the experiments are performed on one synthesized dataset and three real datasets namely, weather data, Japanese vowels and activity of daily living data. Five performance measures namely, accuracy, precision, recall, F-score, and Jaccard index are considered for comparing proposed method with state-of-the-art methods. Two non-parametric tests namely, Wilcoxon rank sum and sign test are also conducted. These results denote the effectiveness and efficiency of the proposed method over state-of-the-art methods.</description><identifier>ISSN: 0957-4174</identifier><identifier>EISSN: 1873-6793</identifier><identifier>DOI: 10.1016/j.eswa.2019.06.050</identifier><language>eng</language><publisher>New York: Elsevier Ltd</publisher><subject>Algorithms ; Clustering ; Clustering analysis ; Computer simulation ; Datasets ; Heuristic methods ; Information management ; Jeffreys divergence ; Kullback–Leibler divergence ; Management systems ; Meteorological data ; Methods ; Modelling ; Monte Carlo integration ; Monte Carlo simulation ; Test procedures ; Uncertain data modeling ; Vowels ; Weather forecasting</subject><ispartof>Expert systems with applications, 2019-12, Vol.137, p.100-116</ispartof><rights>2019 Elsevier Ltd</rights><rights>Copyright Elsevier BV Dec 15, 2019</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c328t-741057d31cbdcf1a5195f5a70d836b5656053301329323cb6ec5ed1b2159e0f93</citedby><cites>FETCH-LOGICAL-c328t-741057d31cbdcf1a5195f5a70d836b5656053301329323cb6ec5ed1b2159e0f93</cites><orcidid>0000-0002-9939-2926</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Sharma, Krishna Kumar</creatorcontrib><creatorcontrib>Seal, Ayan</creatorcontrib><title>Modeling uncertain data using Monte Carlo integration method for clustering</title><title>Expert systems with applications</title><description>•Uncertain data is expressed as pds using a modified Monte-Carlo Integration method.•Three distance metrics have been presented to find the similarity between two pds.•These metrics have been derived from the notion of KL &amp; Jeffreys divergences.•Finally, k-medoid &amp; DBSCAN clustering algorithm has been modified for clustering. Nowadays, data clustering is an important task to the mining research community since the availability of uncertain data is increasing rapidly in many applications such as weather forecasting, business information management systems. In this work, proposed Monte Carlo integration based uncertain objects modeling technique is compared with three state-of-the-art methods namely, kernel density estimation, Dempster–Shafer, and Monte Carlo simulation. Then Kullback–Leibler and Jeffrey divergences are used to measure the similarity between uncertain objects and merge them with modified DBSCAN and k-medoids clustering algorithms. A heuristic algorithm is proposed to find the optimum radius, which is one of the inputs of DBSCAN. All the experiments are performed on one synthesized dataset and three real datasets namely, weather data, Japanese vowels and activity of daily living data. Five performance measures namely, accuracy, precision, recall, F-score, and Jaccard index are considered for comparing proposed method with state-of-the-art methods. Two non-parametric tests namely, Wilcoxon rank sum and sign test are also conducted. These results denote the effectiveness and efficiency of the proposed method over state-of-the-art methods.</description><subject>Algorithms</subject><subject>Clustering</subject><subject>Clustering analysis</subject><subject>Computer simulation</subject><subject>Datasets</subject><subject>Heuristic methods</subject><subject>Information management</subject><subject>Jeffreys divergence</subject><subject>Kullback–Leibler divergence</subject><subject>Management systems</subject><subject>Meteorological data</subject><subject>Methods</subject><subject>Modelling</subject><subject>Monte Carlo integration</subject><subject>Monte Carlo simulation</subject><subject>Test procedures</subject><subject>Uncertain data modeling</subject><subject>Vowels</subject><subject>Weather forecasting</subject><issn>0957-4174</issn><issn>1873-6793</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LxDAQhoMouK7-AU8Bz62Tpkm24EUWv3AXL3oOaZKuKd1Gk1Tx35uynj3NMLzPzPAgdEmgJED4dV_a-K3KCkhTAi-BwRFakJWgBRcNPUYLaJgoaiLqU3QWYw9ABIBYoOetN3Zw4w5Po7YhKTdio5LCU5yHWz8mi9cqDB673O6CSs6PeG_Tuze48wHrYYrJhpw-RyedGqK9-KtL9HZ_97p-LDYvD0_r202habVKhagJMGEo0a3RHVGMNKxjSoBZUd4yzjgwSoHQqqEV1S23mllD2oqwxkLX0CW6Ouz9CP5zsjHJ3k9hzCdlRYHXgoma5VR1SOngYwy2kx_B7VX4kQTkLE32cpYmZ2kSuMzSMnRzgGz-_8vZIKN2NpsxLlidpPHuP_wX-Tx08Q</recordid><startdate>20191215</startdate><enddate>20191215</enddate><creator>Sharma, Krishna Kumar</creator><creator>Seal, Ayan</creator><general>Elsevier Ltd</general><general>Elsevier BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-9939-2926</orcidid></search><sort><creationdate>20191215</creationdate><title>Modeling uncertain data using Monte Carlo integration method for clustering</title><author>Sharma, Krishna Kumar ; Seal, Ayan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c328t-741057d31cbdcf1a5195f5a70d836b5656053301329323cb6ec5ed1b2159e0f93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithms</topic><topic>Clustering</topic><topic>Clustering analysis</topic><topic>Computer simulation</topic><topic>Datasets</topic><topic>Heuristic methods</topic><topic>Information management</topic><topic>Jeffreys divergence</topic><topic>Kullback–Leibler divergence</topic><topic>Management systems</topic><topic>Meteorological data</topic><topic>Methods</topic><topic>Modelling</topic><topic>Monte Carlo integration</topic><topic>Monte Carlo simulation</topic><topic>Test procedures</topic><topic>Uncertain data modeling</topic><topic>Vowels</topic><topic>Weather forecasting</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sharma, Krishna Kumar</creatorcontrib><creatorcontrib>Seal, Ayan</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems with applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sharma, Krishna Kumar</au><au>Seal, Ayan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Modeling uncertain data using Monte Carlo integration method for clustering</atitle><jtitle>Expert systems with applications</jtitle><date>2019-12-15</date><risdate>2019</risdate><volume>137</volume><spage>100</spage><epage>116</epage><pages>100-116</pages><issn>0957-4174</issn><eissn>1873-6793</eissn><abstract>•Uncertain data is expressed as pds using a modified Monte-Carlo Integration method.•Three distance metrics have been presented to find the similarity between two pds.•These metrics have been derived from the notion of KL &amp; Jeffreys divergences.•Finally, k-medoid &amp; DBSCAN clustering algorithm has been modified for clustering. Nowadays, data clustering is an important task to the mining research community since the availability of uncertain data is increasing rapidly in many applications such as weather forecasting, business information management systems. In this work, proposed Monte Carlo integration based uncertain objects modeling technique is compared with three state-of-the-art methods namely, kernel density estimation, Dempster–Shafer, and Monte Carlo simulation. Then Kullback–Leibler and Jeffrey divergences are used to measure the similarity between uncertain objects and merge them with modified DBSCAN and k-medoids clustering algorithms. A heuristic algorithm is proposed to find the optimum radius, which is one of the inputs of DBSCAN. All the experiments are performed on one synthesized dataset and three real datasets namely, weather data, Japanese vowels and activity of daily living data. Five performance measures namely, accuracy, precision, recall, F-score, and Jaccard index are considered for comparing proposed method with state-of-the-art methods. Two non-parametric tests namely, Wilcoxon rank sum and sign test are also conducted. These results denote the effectiveness and efficiency of the proposed method over state-of-the-art methods.</abstract><cop>New York</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.eswa.2019.06.050</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0002-9939-2926</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0957-4174
ispartof Expert systems with applications, 2019-12, Vol.137, p.100-116
issn 0957-4174
1873-6793
language eng
recordid cdi_proquest_journals_2306475745
source Elsevier
subjects Algorithms
Clustering
Clustering analysis
Computer simulation
Datasets
Heuristic methods
Information management
Jeffreys divergence
Kullback–Leibler divergence
Management systems
Meteorological data
Methods
Modelling
Monte Carlo integration
Monte Carlo simulation
Test procedures
Uncertain data modeling
Vowels
Weather forecasting
title Modeling uncertain data using Monte Carlo integration method for clustering
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T14%3A52%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Modeling%20uncertain%20data%20using%20Monte%20Carlo%20integration%20method%20for%20clustering&rft.jtitle=Expert%20systems%20with%20applications&rft.au=Sharma,%20Krishna%20Kumar&rft.date=2019-12-15&rft.volume=137&rft.spage=100&rft.epage=116&rft.pages=100-116&rft.issn=0957-4174&rft.eissn=1873-6793&rft_id=info:doi/10.1016/j.eswa.2019.06.050&rft_dat=%3Cproquest_cross%3E2306475745%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c328t-741057d31cbdcf1a5195f5a70d836b5656053301329323cb6ec5ed1b2159e0f93%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2306475745&rft_id=info:pmid/&rfr_iscdi=true