Loading…
Experimental Comparisons of Clustering Approaches for Data Representation
Clustering approaches are extensively used by many areas such as IR, Data Integration, Document Classification, Web Mining, Query Processing, and many other domains and disciplines. Nowadays, much literature describes clustering algorithms on multivariate data sets. However, there is limited literat...
Saved in:
Published in: | ACM computing surveys 2022-03, Vol.55 (3), p.1-33, Article 45 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-a272t-6e2ea8968b4a00c37434bc12c7df4c9e2b2a43868823dd88c457a22a03cd73b63 |
---|---|
cites | cdi_FETCH-LOGICAL-a272t-6e2ea8968b4a00c37434bc12c7df4c9e2b2a43868823dd88c457a22a03cd73b63 |
container_end_page | 33 |
container_issue | 3 |
container_start_page | 1 |
container_title | ACM computing surveys |
container_volume | 55 |
creator | Anand, Sanjay Kumar Kumar, Suresh |
description | Clustering approaches are extensively used by many areas such as IR, Data Integration, Document Classification, Web Mining, Query Processing, and many other domains and disciplines. Nowadays, much literature describes clustering algorithms on multivariate data sets. However, there is limited literature that presented them with exhaustive and extensive theoretical analysis as well as experimental comparisons. This experimental survey paper deals with the basic principle, and techniques used, including important characteristics, application areas, run-time performance, internal, external, and stability validity of cluster quality, etc., on five different data sets of eleven clustering algorithms. This paper analyses how these algorithms behave with five different multivariate data sets in data representation. To answer this question, we compared the efficiency of eleven clustering approaches on five different data sets using three validity metrics-internal, external, and stability and found the optimal score to know the feasible solution of each algorithm. In addition, we have also included four popular and modern clustering algorithms with only their theoretical discussion. Our experimental results for only traditional clustering algorithms showed that different algorithms performed different behavior on different data sets in terms of running time (speed), accuracy and, the size of data set. This study emphasized the need for more adaptive algorithms and a deliberate balance between the running time and accuracy with their theoretical as well as implementation aspects. |
doi_str_mv | 10.1145/3490384 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2775777978</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2775777978</sourcerecordid><originalsourceid>FETCH-LOGICAL-a272t-6e2ea8968b4a00c37434bc12c7df4c9e2b2a43868823dd88c457a22a03cd73b63</originalsourceid><addsrcrecordid>eNo9kM1Lw0AQxRdRsFbx7mnBg6fo7Ecym2OJVQsFQfQcJpuNtrTZuJuA_vdNafU0h_ebefMeY9cC7oXQ6YPSOSijT9hEpCkmqLQ4ZRNQGSSgAM7ZRYxrAJBaZBO2mP90Lqy2ru1pwwu_7Sisom8j9w0vNkPsR7X95LOuC57sl4u88YE_Uk_8zXXBxf1mv_LtJTtraBPd1XFO2cfT_L14SZavz4titkxIouyTzElHJs9MpQnAKtRKV1ZIi3Wjbe5kJUkrkxkjVV0bY3WKJCWBsjWqKlNTdnu4Oz70PbjYl2s_hHa0LCViiog5mpG6O1A2-BiDa8puTEnhtxRQ7nsqjz2N5M2BJLv9h_7EHZWyYbM</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2775777978</pqid></control><display><type>article</type><title>Experimental Comparisons of Clustering Approaches for Data Representation</title><source>Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)</source><source>BSC - Ebsco (Business Source Ultimate)</source><creator>Anand, Sanjay Kumar ; Kumar, Suresh</creator><creatorcontrib>Anand, Sanjay Kumar ; Kumar, Suresh</creatorcontrib><description>Clustering approaches are extensively used by many areas such as IR, Data Integration, Document Classification, Web Mining, Query Processing, and many other domains and disciplines. Nowadays, much literature describes clustering algorithms on multivariate data sets. However, there is limited literature that presented them with exhaustive and extensive theoretical analysis as well as experimental comparisons. This experimental survey paper deals with the basic principle, and techniques used, including important characteristics, application areas, run-time performance, internal, external, and stability validity of cluster quality, etc., on five different data sets of eleven clustering algorithms. This paper analyses how these algorithms behave with five different multivariate data sets in data representation. To answer this question, we compared the efficiency of eleven clustering approaches on five different data sets using three validity metrics-internal, external, and stability and found the optimal score to know the feasible solution of each algorithm. In addition, we have also included four popular and modern clustering algorithms with only their theoretical discussion. Our experimental results for only traditional clustering algorithms showed that different algorithms performed different behavior on different data sets in terms of running time (speed), accuracy and, the size of data set. This study emphasized the need for more adaptive algorithms and a deliberate balance between the running time and accuracy with their theoretical as well as implementation aspects.</description><identifier>ISSN: 0360-0300</identifier><identifier>EISSN: 1557-7341</identifier><identifier>DOI: 10.1145/3490384</identifier><language>eng</language><publisher>New York, NY: ACM</publisher><subject>Adaptive algorithms ; Algorithms ; Clustering ; Clustering and classification ; Computer science ; Data integration ; Data mining ; Datasets ; Information retrieval ; Information systems ; Multivariate analysis ; Query processing ; Representations ; Retrieval tasks and goals ; Run time (computers) ; Stability</subject><ispartof>ACM computing surveys, 2022-03, Vol.55 (3), p.1-33, Article 45</ispartof><rights>Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from</rights><rights>Copyright Association for Computing Machinery Mar 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a272t-6e2ea8968b4a00c37434bc12c7df4c9e2b2a43868823dd88c457a22a03cd73b63</citedby><cites>FETCH-LOGICAL-a272t-6e2ea8968b4a00c37434bc12c7df4c9e2b2a43868823dd88c457a22a03cd73b63</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27903,27904</link.rule.ids></links><search><creatorcontrib>Anand, Sanjay Kumar</creatorcontrib><creatorcontrib>Kumar, Suresh</creatorcontrib><title>Experimental Comparisons of Clustering Approaches for Data Representation</title><title>ACM computing surveys</title><addtitle>ACM CSUR</addtitle><description>Clustering approaches are extensively used by many areas such as IR, Data Integration, Document Classification, Web Mining, Query Processing, and many other domains and disciplines. Nowadays, much literature describes clustering algorithms on multivariate data sets. However, there is limited literature that presented them with exhaustive and extensive theoretical analysis as well as experimental comparisons. This experimental survey paper deals with the basic principle, and techniques used, including important characteristics, application areas, run-time performance, internal, external, and stability validity of cluster quality, etc., on five different data sets of eleven clustering algorithms. This paper analyses how these algorithms behave with five different multivariate data sets in data representation. To answer this question, we compared the efficiency of eleven clustering approaches on five different data sets using three validity metrics-internal, external, and stability and found the optimal score to know the feasible solution of each algorithm. In addition, we have also included four popular and modern clustering algorithms with only their theoretical discussion. Our experimental results for only traditional clustering algorithms showed that different algorithms performed different behavior on different data sets in terms of running time (speed), accuracy and, the size of data set. This study emphasized the need for more adaptive algorithms and a deliberate balance between the running time and accuracy with their theoretical as well as implementation aspects.</description><subject>Adaptive algorithms</subject><subject>Algorithms</subject><subject>Clustering</subject><subject>Clustering and classification</subject><subject>Computer science</subject><subject>Data integration</subject><subject>Data mining</subject><subject>Datasets</subject><subject>Information retrieval</subject><subject>Information systems</subject><subject>Multivariate analysis</subject><subject>Query processing</subject><subject>Representations</subject><subject>Retrieval tasks and goals</subject><subject>Run time (computers)</subject><subject>Stability</subject><issn>0360-0300</issn><issn>1557-7341</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNo9kM1Lw0AQxRdRsFbx7mnBg6fo7Ecym2OJVQsFQfQcJpuNtrTZuJuA_vdNafU0h_ebefMeY9cC7oXQ6YPSOSijT9hEpCkmqLQ4ZRNQGSSgAM7ZRYxrAJBaZBO2mP90Lqy2ru1pwwu_7Sisom8j9w0vNkPsR7X95LOuC57sl4u88YE_Uk_8zXXBxf1mv_LtJTtraBPd1XFO2cfT_L14SZavz4titkxIouyTzElHJs9MpQnAKtRKV1ZIi3Wjbe5kJUkrkxkjVV0bY3WKJCWBsjWqKlNTdnu4Oz70PbjYl2s_hHa0LCViiog5mpG6O1A2-BiDa8puTEnhtxRQ7nsqjz2N5M2BJLv9h_7EHZWyYbM</recordid><startdate>20220330</startdate><enddate>20220330</enddate><creator>Anand, Sanjay Kumar</creator><creator>Kumar, Suresh</creator><general>ACM</general><general>Association for Computing Machinery</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20220330</creationdate><title>Experimental Comparisons of Clustering Approaches for Data Representation</title><author>Anand, Sanjay Kumar ; Kumar, Suresh</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a272t-6e2ea8968b4a00c37434bc12c7df4c9e2b2a43868823dd88c457a22a03cd73b63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Adaptive algorithms</topic><topic>Algorithms</topic><topic>Clustering</topic><topic>Clustering and classification</topic><topic>Computer science</topic><topic>Data integration</topic><topic>Data mining</topic><topic>Datasets</topic><topic>Information retrieval</topic><topic>Information systems</topic><topic>Multivariate analysis</topic><topic>Query processing</topic><topic>Representations</topic><topic>Retrieval tasks and goals</topic><topic>Run time (computers)</topic><topic>Stability</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Anand, Sanjay Kumar</creatorcontrib><creatorcontrib>Kumar, Suresh</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>ACM computing surveys</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Anand, Sanjay Kumar</au><au>Kumar, Suresh</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Experimental Comparisons of Clustering Approaches for Data Representation</atitle><jtitle>ACM computing surveys</jtitle><stitle>ACM CSUR</stitle><date>2022-03-30</date><risdate>2022</risdate><volume>55</volume><issue>3</issue><spage>1</spage><epage>33</epage><pages>1-33</pages><artnum>45</artnum><issn>0360-0300</issn><eissn>1557-7341</eissn><abstract>Clustering approaches are extensively used by many areas such as IR, Data Integration, Document Classification, Web Mining, Query Processing, and many other domains and disciplines. Nowadays, much literature describes clustering algorithms on multivariate data sets. However, there is limited literature that presented them with exhaustive and extensive theoretical analysis as well as experimental comparisons. This experimental survey paper deals with the basic principle, and techniques used, including important characteristics, application areas, run-time performance, internal, external, and stability validity of cluster quality, etc., on five different data sets of eleven clustering algorithms. This paper analyses how these algorithms behave with five different multivariate data sets in data representation. To answer this question, we compared the efficiency of eleven clustering approaches on five different data sets using three validity metrics-internal, external, and stability and found the optimal score to know the feasible solution of each algorithm. In addition, we have also included four popular and modern clustering algorithms with only their theoretical discussion. Our experimental results for only traditional clustering algorithms showed that different algorithms performed different behavior on different data sets in terms of running time (speed), accuracy and, the size of data set. This study emphasized the need for more adaptive algorithms and a deliberate balance between the running time and accuracy with their theoretical as well as implementation aspects.</abstract><cop>New York, NY</cop><pub>ACM</pub><doi>10.1145/3490384</doi><tpages>33</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0360-0300 |
ispartof | ACM computing surveys, 2022-03, Vol.55 (3), p.1-33, Article 45 |
issn | 0360-0300 1557-7341 |
language | eng |
recordid | cdi_proquest_journals_2775777978 |
source | Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list); BSC - Ebsco (Business Source Ultimate) |
subjects | Adaptive algorithms Algorithms Clustering Clustering and classification Computer science Data integration Data mining Datasets Information retrieval Information systems Multivariate analysis Query processing Representations Retrieval tasks and goals Run time (computers) Stability |
title | Experimental Comparisons of Clustering Approaches for Data Representation |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T13%3A43%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Experimental%20Comparisons%20of%20Clustering%20Approaches%20for%20Data%20Representation&rft.jtitle=ACM%20computing%20surveys&rft.au=Anand,%20Sanjay%20Kumar&rft.date=2022-03-30&rft.volume=55&rft.issue=3&rft.spage=1&rft.epage=33&rft.pages=1-33&rft.artnum=45&rft.issn=0360-0300&rft.eissn=1557-7341&rft_id=info:doi/10.1145/3490384&rft_dat=%3Cproquest_cross%3E2775777978%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a272t-6e2ea8968b4a00c37434bc12c7df4c9e2b2a43868823dd88c457a22a03cd73b63%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2775777978&rft_id=info:pmid/&rfr_iscdi=true |