Loading…

Experimental Comparisons of Clustering Approaches for Data Representation

Clustering approaches are extensively used by many areas such as IR, Data Integration, Document Classification, Web Mining, Query Processing, and many other domains and disciplines. Nowadays, much literature describes clustering algorithms on multivariate data sets. However, there is limited literat...

Full description

Saved in:
Bibliographic Details
Published in:ACM computing surveys 2022-03, Vol.55 (3), p.1-33, Article 45
Main Authors: Anand, Sanjay Kumar, Kumar, Suresh
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-a272t-6e2ea8968b4a00c37434bc12c7df4c9e2b2a43868823dd88c457a22a03cd73b63
cites cdi_FETCH-LOGICAL-a272t-6e2ea8968b4a00c37434bc12c7df4c9e2b2a43868823dd88c457a22a03cd73b63
container_end_page 33
container_issue 3
container_start_page 1
container_title ACM computing surveys
container_volume 55
creator Anand, Sanjay Kumar
Kumar, Suresh
description Clustering approaches are extensively used by many areas such as IR, Data Integration, Document Classification, Web Mining, Query Processing, and many other domains and disciplines. Nowadays, much literature describes clustering algorithms on multivariate data sets. However, there is limited literature that presented them with exhaustive and extensive theoretical analysis as well as experimental comparisons. This experimental survey paper deals with the basic principle, and techniques used, including important characteristics, application areas, run-time performance, internal, external, and stability validity of cluster quality, etc., on five different data sets of eleven clustering algorithms. This paper analyses how these algorithms behave with five different multivariate data sets in data representation. To answer this question, we compared the efficiency of eleven clustering approaches on five different data sets using three validity metrics-internal, external, and stability and found the optimal score to know the feasible solution of each algorithm. In addition, we have also included four popular and modern clustering algorithms with only their theoretical discussion. Our experimental results for only traditional clustering algorithms showed that different algorithms performed different behavior on different data sets in terms of running time (speed), accuracy and, the size of data set. This study emphasized the need for more adaptive algorithms and a deliberate balance between the running time and accuracy with their theoretical as well as implementation aspects.
doi_str_mv 10.1145/3490384
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2775777978</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2775777978</sourcerecordid><originalsourceid>FETCH-LOGICAL-a272t-6e2ea8968b4a00c37434bc12c7df4c9e2b2a43868823dd88c457a22a03cd73b63</originalsourceid><addsrcrecordid>eNo9kM1Lw0AQxRdRsFbx7mnBg6fo7Ecym2OJVQsFQfQcJpuNtrTZuJuA_vdNafU0h_ebefMeY9cC7oXQ6YPSOSijT9hEpCkmqLQ4ZRNQGSSgAM7ZRYxrAJBaZBO2mP90Lqy2ru1pwwu_7Sisom8j9w0vNkPsR7X95LOuC57sl4u88YE_Uk_8zXXBxf1mv_LtJTtraBPd1XFO2cfT_L14SZavz4titkxIouyTzElHJs9MpQnAKtRKV1ZIi3Wjbe5kJUkrkxkjVV0bY3WKJCWBsjWqKlNTdnu4Oz70PbjYl2s_hHa0LCViiog5mpG6O1A2-BiDa8puTEnhtxRQ7nsqjz2N5M2BJLv9h_7EHZWyYbM</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2775777978</pqid></control><display><type>article</type><title>Experimental Comparisons of Clustering Approaches for Data Representation</title><source>Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)</source><source>BSC - Ebsco (Business Source Ultimate)</source><creator>Anand, Sanjay Kumar ; Kumar, Suresh</creator><creatorcontrib>Anand, Sanjay Kumar ; Kumar, Suresh</creatorcontrib><description>Clustering approaches are extensively used by many areas such as IR, Data Integration, Document Classification, Web Mining, Query Processing, and many other domains and disciplines. Nowadays, much literature describes clustering algorithms on multivariate data sets. However, there is limited literature that presented them with exhaustive and extensive theoretical analysis as well as experimental comparisons. This experimental survey paper deals with the basic principle, and techniques used, including important characteristics, application areas, run-time performance, internal, external, and stability validity of cluster quality, etc., on five different data sets of eleven clustering algorithms. This paper analyses how these algorithms behave with five different multivariate data sets in data representation. To answer this question, we compared the efficiency of eleven clustering approaches on five different data sets using three validity metrics-internal, external, and stability and found the optimal score to know the feasible solution of each algorithm. In addition, we have also included four popular and modern clustering algorithms with only their theoretical discussion. Our experimental results for only traditional clustering algorithms showed that different algorithms performed different behavior on different data sets in terms of running time (speed), accuracy and, the size of data set. This study emphasized the need for more adaptive algorithms and a deliberate balance between the running time and accuracy with their theoretical as well as implementation aspects.</description><identifier>ISSN: 0360-0300</identifier><identifier>EISSN: 1557-7341</identifier><identifier>DOI: 10.1145/3490384</identifier><language>eng</language><publisher>New York, NY: ACM</publisher><subject>Adaptive algorithms ; Algorithms ; Clustering ; Clustering and classification ; Computer science ; Data integration ; Data mining ; Datasets ; Information retrieval ; Information systems ; Multivariate analysis ; Query processing ; Representations ; Retrieval tasks and goals ; Run time (computers) ; Stability</subject><ispartof>ACM computing surveys, 2022-03, Vol.55 (3), p.1-33, Article 45</ispartof><rights>Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from</rights><rights>Copyright Association for Computing Machinery Mar 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a272t-6e2ea8968b4a00c37434bc12c7df4c9e2b2a43868823dd88c457a22a03cd73b63</citedby><cites>FETCH-LOGICAL-a272t-6e2ea8968b4a00c37434bc12c7df4c9e2b2a43868823dd88c457a22a03cd73b63</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27903,27904</link.rule.ids></links><search><creatorcontrib>Anand, Sanjay Kumar</creatorcontrib><creatorcontrib>Kumar, Suresh</creatorcontrib><title>Experimental Comparisons of Clustering Approaches for Data Representation</title><title>ACM computing surveys</title><addtitle>ACM CSUR</addtitle><description>Clustering approaches are extensively used by many areas such as IR, Data Integration, Document Classification, Web Mining, Query Processing, and many other domains and disciplines. Nowadays, much literature describes clustering algorithms on multivariate data sets. However, there is limited literature that presented them with exhaustive and extensive theoretical analysis as well as experimental comparisons. This experimental survey paper deals with the basic principle, and techniques used, including important characteristics, application areas, run-time performance, internal, external, and stability validity of cluster quality, etc., on five different data sets of eleven clustering algorithms. This paper analyses how these algorithms behave with five different multivariate data sets in data representation. To answer this question, we compared the efficiency of eleven clustering approaches on five different data sets using three validity metrics-internal, external, and stability and found the optimal score to know the feasible solution of each algorithm. In addition, we have also included four popular and modern clustering algorithms with only their theoretical discussion. Our experimental results for only traditional clustering algorithms showed that different algorithms performed different behavior on different data sets in terms of running time (speed), accuracy and, the size of data set. This study emphasized the need for more adaptive algorithms and a deliberate balance between the running time and accuracy with their theoretical as well as implementation aspects.</description><subject>Adaptive algorithms</subject><subject>Algorithms</subject><subject>Clustering</subject><subject>Clustering and classification</subject><subject>Computer science</subject><subject>Data integration</subject><subject>Data mining</subject><subject>Datasets</subject><subject>Information retrieval</subject><subject>Information systems</subject><subject>Multivariate analysis</subject><subject>Query processing</subject><subject>Representations</subject><subject>Retrieval tasks and goals</subject><subject>Run time (computers)</subject><subject>Stability</subject><issn>0360-0300</issn><issn>1557-7341</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNo9kM1Lw0AQxRdRsFbx7mnBg6fo7Ecym2OJVQsFQfQcJpuNtrTZuJuA_vdNafU0h_ebefMeY9cC7oXQ6YPSOSijT9hEpCkmqLQ4ZRNQGSSgAM7ZRYxrAJBaZBO2mP90Lqy2ru1pwwu_7Sisom8j9w0vNkPsR7X95LOuC57sl4u88YE_Uk_8zXXBxf1mv_LtJTtraBPd1XFO2cfT_L14SZavz4titkxIouyTzElHJs9MpQnAKtRKV1ZIi3Wjbe5kJUkrkxkjVV0bY3WKJCWBsjWqKlNTdnu4Oz70PbjYl2s_hHa0LCViiog5mpG6O1A2-BiDa8puTEnhtxRQ7nsqjz2N5M2BJLv9h_7EHZWyYbM</recordid><startdate>20220330</startdate><enddate>20220330</enddate><creator>Anand, Sanjay Kumar</creator><creator>Kumar, Suresh</creator><general>ACM</general><general>Association for Computing Machinery</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20220330</creationdate><title>Experimental Comparisons of Clustering Approaches for Data Representation</title><author>Anand, Sanjay Kumar ; Kumar, Suresh</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a272t-6e2ea8968b4a00c37434bc12c7df4c9e2b2a43868823dd88c457a22a03cd73b63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Adaptive algorithms</topic><topic>Algorithms</topic><topic>Clustering</topic><topic>Clustering and classification</topic><topic>Computer science</topic><topic>Data integration</topic><topic>Data mining</topic><topic>Datasets</topic><topic>Information retrieval</topic><topic>Information systems</topic><topic>Multivariate analysis</topic><topic>Query processing</topic><topic>Representations</topic><topic>Retrieval tasks and goals</topic><topic>Run time (computers)</topic><topic>Stability</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Anand, Sanjay Kumar</creatorcontrib><creatorcontrib>Kumar, Suresh</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>ACM computing surveys</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Anand, Sanjay Kumar</au><au>Kumar, Suresh</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Experimental Comparisons of Clustering Approaches for Data Representation</atitle><jtitle>ACM computing surveys</jtitle><stitle>ACM CSUR</stitle><date>2022-03-30</date><risdate>2022</risdate><volume>55</volume><issue>3</issue><spage>1</spage><epage>33</epage><pages>1-33</pages><artnum>45</artnum><issn>0360-0300</issn><eissn>1557-7341</eissn><abstract>Clustering approaches are extensively used by many areas such as IR, Data Integration, Document Classification, Web Mining, Query Processing, and many other domains and disciplines. Nowadays, much literature describes clustering algorithms on multivariate data sets. However, there is limited literature that presented them with exhaustive and extensive theoretical analysis as well as experimental comparisons. This experimental survey paper deals with the basic principle, and techniques used, including important characteristics, application areas, run-time performance, internal, external, and stability validity of cluster quality, etc., on five different data sets of eleven clustering algorithms. This paper analyses how these algorithms behave with five different multivariate data sets in data representation. To answer this question, we compared the efficiency of eleven clustering approaches on five different data sets using three validity metrics-internal, external, and stability and found the optimal score to know the feasible solution of each algorithm. In addition, we have also included four popular and modern clustering algorithms with only their theoretical discussion. Our experimental results for only traditional clustering algorithms showed that different algorithms performed different behavior on different data sets in terms of running time (speed), accuracy and, the size of data set. This study emphasized the need for more adaptive algorithms and a deliberate balance between the running time and accuracy with their theoretical as well as implementation aspects.</abstract><cop>New York, NY</cop><pub>ACM</pub><doi>10.1145/3490384</doi><tpages>33</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0360-0300
ispartof ACM computing surveys, 2022-03, Vol.55 (3), p.1-33, Article 45
issn 0360-0300
1557-7341
language eng
recordid cdi_proquest_journals_2775777978
source Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list); BSC - Ebsco (Business Source Ultimate)
subjects Adaptive algorithms
Algorithms
Clustering
Clustering and classification
Computer science
Data integration
Data mining
Datasets
Information retrieval
Information systems
Multivariate analysis
Query processing
Representations
Retrieval tasks and goals
Run time (computers)
Stability
title Experimental Comparisons of Clustering Approaches for Data Representation
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T13%3A43%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Experimental%20Comparisons%20of%20Clustering%20Approaches%20for%20Data%20Representation&rft.jtitle=ACM%20computing%20surveys&rft.au=Anand,%20Sanjay%20Kumar&rft.date=2022-03-30&rft.volume=55&rft.issue=3&rft.spage=1&rft.epage=33&rft.pages=1-33&rft.artnum=45&rft.issn=0360-0300&rft.eissn=1557-7341&rft_id=info:doi/10.1145/3490384&rft_dat=%3Cproquest_cross%3E2775777978%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a272t-6e2ea8968b4a00c37434bc12c7df4c9e2b2a43868823dd88c457a22a03cd73b63%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2775777978&rft_id=info:pmid/&rfr_iscdi=true