Loading…

Modeling uncertain data using Monte Carlo integration method for clustering

•Uncertain data is expressed as pds using a modified Monte-Carlo Integration method.•Three distance metrics have been presented to find the similarity between two pds.•These metrics have been derived from the notion of KL & Jeffreys divergences.•Finally, k-medoid & DBSCAN clustering algorith...

Full description

Saved in:

Bibliographic Details
Published in:	Expert systems with applications 2019-12, Vol.137, p.100-116
Main Authors:	Sharma, Krishna Kumar, Seal, Ayan
Format:	Article
Language:	English
Subjects:	Algorithms Clustering Clustering analysis Computer simulation Datasets Heuristic methods Information management Jeffreys divergence Kullback–Leibler divergence Management systems Meteorological data Methods Modelling Monte Carlo integration Monte Carlo simulation Test procedures Uncertain data modeling Vowels Weather forecasting
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c328t-741057d31cbdcf1a5195f5a70d836b5656053301329323cb6ec5ed1b2159e0f93
cites	cdi_FETCH-LOGICAL-c328t-741057d31cbdcf1a5195f5a70d836b5656053301329323cb6ec5ed1b2159e0f93
container_end_page	116
container_issue
container_start_page	100
container_title	Expert systems with applications
container_volume	137
creator	Sharma, Krishna Kumar Seal, Ayan
description	•Uncertain data is expressed as pds using a modified Monte-Carlo Integration method.•Three distance metrics have been presented to find the similarity between two pds.•These metrics have been derived from the notion of KL & Jeffreys divergences.•Finally, k-medoid & DBSCAN clustering algorithm has been modified for clustering. Nowadays, data clustering is an important task to the mining research community since the availability of uncertain data is increasing rapidly in many applications such as weather forecasting, business information management systems. In this work, proposed Monte Carlo integration based uncertain objects modeling technique is compared with three state-of-the-art methods namely, kernel density estimation, Dempster–Shafer, and Monte Carlo simulation. Then Kullback–Leibler and Jeffrey divergences are used to measure the similarity between uncertain objects and merge them with modified DBSCAN and k-medoids clustering algorithms. A heuristic algorithm is proposed to find the optimum radius, which is one of the inputs of DBSCAN. All the experiments are performed on one synthesized dataset and three real datasets namely, weather data, Japanese vowels and activity of daily living data. Five performance measures namely, accuracy, precision, recall, F-score, and Jaccard index are considered for comparing proposed method with state-of-the-art methods. Two non-parametric tests namely, Wilcoxon rank sum and sign test are also conducted. These results denote the effectiveness and efficiency of the proposed method over state-of-the-art methods.
doi_str_mv	10.1016/j.eswa.2019.06.050
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2306475745</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0957417419304506</els_id><sourcerecordid>2306475745</sourcerecordid><originalsourceid>FETCH-LOGICAL-c328t-741057d31cbdcf1a5195f5a70d836b5656053301329323cb6ec5ed1b2159e0f93</originalsourceid><addsrcrecordid>eNp9kE1LxDAQhoMouK7-AU8Bz62Tpkm24EUWv3AXL3oOaZKuKd1Gk1Tx35uynj3NMLzPzPAgdEmgJED4dV_a-K3KCkhTAi-BwRFakJWgBRcNPUYLaJgoaiLqU3QWYw9ABIBYoOetN3Zw4w5Po7YhKTdio5LCU5yHWz8mi9cqDB673O6CSs6PeG_Tuze48wHrYYrJhpw-RyedGqK9-KtL9HZ_97p-LDYvD0_r202habVKhagJMGEo0a3RHVGMNKxjSoBZUd4yzjgwSoHQqqEV1S23mllD2oqwxkLX0CW6Ouz9CP5zsjHJ3k9hzCdlRYHXgoma5VR1SOngYwy2kx_B7VX4kQTkLE32cpYmZ2kSuMzSMnRzgGz-_8vZIKN2NpsxLlidpPHuP_wX-Tx08Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2306475745</pqid></control><display><type>article</type><title>Modeling uncertain data using Monte Carlo integration method for clustering</title><source>Elsevier</source><creator>Sharma, Krishna Kumar ; Seal, Ayan</creator><creatorcontrib>Sharma, Krishna Kumar ; Seal, Ayan</creatorcontrib><description>•Uncertain data is expressed as pds using a modified Monte-Carlo Integration method.•Three distance metrics have been presented to find the similarity between two pds.•These metrics have been derived from the notion of KL & Jeffreys divergences.•Finally, k-medoid & DBSCAN clustering algorithm has been modified for clustering. Nowadays, data clustering is an important task to the mining research community since the availability of uncertain data is increasing rapidly in many applications such as weather forecasting, business information management systems. In this work, proposed Monte Carlo integration based uncertain objects modeling technique is compared with three state-of-the-art methods namely, kernel density estimation, Dempster–Shafer, and Monte Carlo simulation. Then Kullback–Leibler and Jeffrey divergences are used to measure the similarity between uncertain objects and merge them with modified DBSCAN and k-medoids clustering algorithms. A heuristic algorithm is proposed to find the optimum radius, which is one of the inputs of DBSCAN. All the experiments are performed on one synthesized dataset and three real datasets namely, weather data, Japanese vowels and activity of daily living data. Five performance measures namely, accuracy, precision, recall, F-score, and Jaccard index are considered for comparing proposed method with state-of-the-art methods. Two non-parametric tests namely, Wilcoxon rank sum and sign test are also conducted. These results denote the effectiveness and efficiency of the proposed method over state-of-the-art methods.</description><identifier>ISSN: 0957-4174</identifier><identifier>EISSN: 1873-6793</identifier><identifier>DOI: 10.1016/j.eswa.2019.06.050</identifier><language>eng</language><publisher>New York: Elsevier Ltd</publisher><subject>Algorithms ; Clustering ; Clustering analysis ; Computer simulation ; Datasets ; Heuristic methods ; Information management ; Jeffreys divergence ; Kullback–Leibler divergence ; Management systems ; Meteorological data ; Methods ; Modelling ; Monte Carlo integration ; Monte Carlo simulation ; Test procedures ; Uncertain data modeling ; Vowels ; Weather forecasting</subject><ispartof>Expert systems with applications, 2019-12, Vol.137, p.100-116</ispartof><rights>2019 Elsevier Ltd</rights><rights>Copyright Elsevier BV Dec 15, 2019</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c328t-741057d31cbdcf1a5195f5a70d836b5656053301329323cb6ec5ed1b2159e0f93</citedby><cites>FETCH-LOGICAL-c328t-741057d31cbdcf1a5195f5a70d836b5656053301329323cb6ec5ed1b2159e0f93</cites><orcidid>0000-0002-9939-2926</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Sharma, Krishna Kumar</creatorcontrib><creatorcontrib>Seal, Ayan</creatorcontrib><title>Modeling uncertain data using Monte Carlo integration method for clustering</title><title>Expert systems with applications</title><description>•Uncertain data is expressed as pds using a modified Monte-Carlo Integration method.•Three distance metrics have been presented to find the similarity between two pds.•These metrics have been derived from the notion of KL & Jeffreys divergences.•Finally, k-medoid & DBSCAN clustering algorithm has been modified for clustering. Nowadays, data clustering is an important task to the mining research community since the availability of uncertain data is increasing rapidly in many applications such as weather forecasting, business information management systems. In this work, proposed Monte Carlo integration based uncertain objects modeling technique is compared with three state-of-the-art methods namely, kernel density estimation, Dempster–Shafer, and Monte Carlo simulation. Then Kullback–Leibler and Jeffrey divergences are used to measure the similarity between uncertain objects and merge them with modified DBSCAN and k-medoids clustering algorithms. A heuristic algorithm is proposed to find the optimum radius, which is one of the inputs of DBSCAN. All the experiments are performed on one synthesized dataset and three real datasets namely, weather data, Japanese vowels and activity of daily living data. Five performance measures namely, accuracy, precision, recall, F-score, and Jaccard index are considered for comparing proposed method with state-of-the-art methods. Two non-parametric tests namely, Wilcoxon rank sum and sign test are also conducted. These results denote the effectiveness and efficiency of the proposed method over state-of-the-art methods.</description><subject>Algorithms</subject><subject>Clustering</subject><subject>Clustering analysis</subject><subject>Computer simulation</subject><subject>Datasets</subject><subject>Heuristic methods</subject><subject>Information management</subject><subject>Jeffreys divergence</subject><subject>Kullback–Leibler divergence</subject><subject>Management systems</subject><subject>Meteorological data</subject><subject>Methods</subject><subject>Modelling</subject><subject>Monte Carlo integration</subject><subject>Monte Carlo simulation</subject><subject>Test procedures</subject><subject>Uncertain data modeling</subject><subject>Vowels</subject><subject>Weather forecasting</subject><issn>0957-4174</issn><issn>1873-6793</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LxDAQhoMouK7-AU8Bz62Tpkm24EUWv3AXL3oOaZKuKd1Gk1Tx35uynj3NMLzPzPAgdEmgJED4dV_a-K3KCkhTAi-BwRFakJWgBRcNPUYLaJgoaiLqU3QWYw9ABIBYoOetN3Zw4w5Po7YhKTdio5LCU5yHWz8mi9cqDB673O6CSs6PeG_Tuze48wHrYYrJhpw-RyedGqK9-KtL9HZ_97p-LDYvD0_r202habVKhagJMGEo0a3RHVGMNKxjSoBZUd4yzjgwSoHQqqEV1S23mllD2oqwxkLX0CW6Ouz9CP5zsjHJ3k9hzCdlRYHXgoma5VR1SOngYwy2kx_B7VX4kQTkLE32cpYmZ2kSuMzSMnRzgGz-_8vZIKN2NpsxLlidpPHuP_wX-Tx08Q</recordid><startdate>20191215</startdate><enddate>20191215</enddate><creator>Sharma, Krishna Kumar</creator><creator>Seal, Ayan</creator><general>Elsevier Ltd</general><general>Elsevier BV</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-9939-2926</orcidid></search><sort><creationdate>20191215</creationdate><title>Modeling uncertain data using Monte Carlo integration method for clustering</title><author>Sharma, Krishna Kumar ; Seal, Ayan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c328t-741057d31cbdcf1a5195f5a70d836b5656053301329323cb6ec5ed1b2159e0f93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithms</topic><topic>Clustering</topic><topic>Clustering analysis</topic><topic>Computer simulation</topic><topic>Datasets</topic><topic>Heuristic methods</topic><topic>Information management</topic><topic>Jeffreys divergence</topic><topic>Kullback–Leibler divergence</topic><topic>Management systems</topic><topic>Meteorological data</topic><topic>Methods</topic><topic>Modelling</topic><topic>Monte Carlo integration</topic><topic>Monte Carlo simulation</topic><topic>Test procedures</topic><topic>Uncertain data modeling</topic><topic>Vowels</topic><topic>Weather forecasting</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sharma, Krishna Kumar</creatorcontrib><creatorcontrib>Seal, Ayan</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Expert systems with applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sharma, Krishna Kumar</au><au>Seal, Ayan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Modeling uncertain data using Monte Carlo integration method for clustering</atitle><jtitle>Expert systems with applications</jtitle><date>2019-12-15</date><risdate>2019</risdate><volume>137</volume><spage>100</spage><epage>116</epage><pages>100-116</pages><issn>0957-4174</issn><eissn>1873-6793</eissn><abstract>•Uncertain data is expressed as pds using a modified Monte-Carlo Integration method.•Three distance metrics have been presented to find the similarity between two pds.•These metrics have been derived from the notion of KL & Jeffreys divergences.•Finally, k-medoid & DBSCAN clustering algorithm has been modified for clustering. Nowadays, data clustering is an important task to the mining research community since the availability of uncertain data is increasing rapidly in many applications such as weather forecasting, business information management systems. In this work, proposed Monte Carlo integration based uncertain objects modeling technique is compared with three state-of-the-art methods namely, kernel density estimation, Dempster–Shafer, and Monte Carlo simulation. Then Kullback–Leibler and Jeffrey divergences are used to measure the similarity between uncertain objects and merge them with modified DBSCAN and k-medoids clustering algorithms. A heuristic algorithm is proposed to find the optimum radius, which is one of the inputs of DBSCAN. All the experiments are performed on one synthesized dataset and three real datasets namely, weather data, Japanese vowels and activity of daily living data. Five performance measures namely, accuracy, precision, recall, F-score, and Jaccard index are considered for comparing proposed method with state-of-the-art methods. Two non-parametric tests namely, Wilcoxon rank sum and sign test are also conducted. These results denote the effectiveness and efficiency of the proposed method over state-of-the-art methods.</abstract><cop>New York</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.eswa.2019.06.050</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0002-9939-2926</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0957-4174
ispartof	Expert systems with applications, 2019-12, Vol.137, p.100-116
issn	0957-4174 1873-6793
language	eng
recordid	cdi_proquest_journals_2306475745
source	Elsevier
subjects	Algorithms Clustering Clustering analysis Computer simulation Datasets Heuristic methods Information management Jeffreys divergence Kullback–Leibler divergence Management systems Meteorological data Methods Modelling Monte Carlo integration Monte Carlo simulation Test procedures Uncertain data modeling Vowels Weather forecasting
title	Modeling uncertain data using Monte Carlo integration method for clustering
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T14%3A52%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Modeling%20uncertain%20data%20using%20Monte%20Carlo%20integration%20method%20for%20clustering&rft.jtitle=Expert%20systems%20with%20applications&rft.au=Sharma,%20Krishna%20Kumar&rft.date=2019-12-15&rft.volume=137&rft.spage=100&rft.epage=116&rft.pages=100-116&rft.issn=0957-4174&rft.eissn=1873-6793&rft_id=info:doi/10.1016/j.eswa.2019.06.050&rft_dat=%3Cproquest_cross%3E2306475745%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c328t-741057d31cbdcf1a5195f5a70d836b5656053301329323cb6ec5ed1b2159e0f93%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2306475745&rft_id=info:pmid/&rfr_iscdi=true