Loading…
Cluster validation in problems with increasing dimensionality and unbalanced clusters
Cluster validation methods provide measures to evaluate the quality of a clustering partition on a given data set, and to determine the correct number of clusters. Recently, a new set of validation techniques based on the clusters' negentropy has been introduced. Negentropy-based cluster valida...
Saved in:
Published in: | Neurocomputing (Amsterdam) 2014-01, Vol.123, p.33-39 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c369t-822fc2f22244487102c9eb2aad62cd039ddfcb6b58f45a243af74dc87f5835483 |
---|---|
cites | cdi_FETCH-LOGICAL-c369t-822fc2f22244487102c9eb2aad62cd039ddfcb6b58f45a243af74dc87f5835483 |
container_end_page | 39 |
container_issue | |
container_start_page | 33 |
container_title | Neurocomputing (Amsterdam) |
container_volume | 123 |
creator | LAGO-FERNANDEZ, Luis F ARAGON, Jesús MARTINEZ-MUNOZ, Gonzalo GONZALEZ, Ana M SANCHEZ-MONTANES, Manuel |
description | Cluster validation methods provide measures to evaluate the quality of a clustering partition on a given data set, and to determine the correct number of clusters. Recently, a new set of validation techniques based on the clusters' negentropy has been introduced. Negentropy-based cluster validation favors data partitions into compact clusters which are not strongly overlapped. Its evaluation is quite simple and it has been shown to perform better than other state of the art techniques. However, like many other cluster validation approaches, it presents problems when validating partitions where some regions contain only a few data points. Different heuristics have been proposed to cope with this problem, which are systematically analyzed in this paper. We study the performance of AIC, BIC, and four negentropy-based validation approaches in synthetic clustering problems of increasing dimensionality, with unbalanced clusters and different degree of overlapping. Our results suggest that negentropy-based validation techniques outperform AIC and BIC when the ratio of the number of points to the dimension is not high, which is a very common situation in most real applications. |
doi_str_mv | 10.1016/j.neucom.2012.09.044 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1671611977</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0925231213003603</els_id><sourcerecordid>1671611977</sourcerecordid><originalsourceid>FETCH-LOGICAL-c369t-822fc2f22244487102c9eb2aad62cd039ddfcb6b58f45a243af74dc87f5835483</originalsourceid><addsrcrecordid>eNp9kE1LxDAQhoMouH78Aw-9CF62JtO0TS6CLH7Bghf3HNJ8aJY2XZN2Zf-9KV08Sg4D4ZmZdx6EbgjOCSbV_Tb3ZlR9lwMmkGOeY0pP0IKwGpYMWHWKFphDuYSCwDm6iHGLMakJ8AXarNoxDiZke9k6LQfX-8z5bBf6pjVdzH7c8JU-VDAyOv-ZadcZHxOV8OGQSa-z0TeylV4Znal5WLxCZ1a20Vwf6yXaPD99rF6X6_eXt9XjeqmKig8pG1gFFgAopawmGBQ3DUipK1AaF1xrq5qqKZmlpQRaSFtTrVhtS1aUlBWX6G6em_J-jyYOonNRmTbFMf0YBalqUhHC6zqhdEZV6GMMxopdcJ0MB0GwmCyKrZgtismiwFwki6nt9rhBRiVbG9KhLv71ApteSRL3MHMmnbt3JoionJmkuGDUIHTv_l_0C7E0i4c</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1671611977</pqid></control><display><type>article</type><title>Cluster validation in problems with increasing dimensionality and unbalanced clusters</title><source>ScienceDirect Freedom Collection 2022-2024</source><creator>LAGO-FERNANDEZ, Luis F ; ARAGON, Jesús ; MARTINEZ-MUNOZ, Gonzalo ; GONZALEZ, Ana M ; SANCHEZ-MONTANES, Manuel</creator><creatorcontrib>LAGO-FERNANDEZ, Luis F ; ARAGON, Jesús ; MARTINEZ-MUNOZ, Gonzalo ; GONZALEZ, Ana M ; SANCHEZ-MONTANES, Manuel</creatorcontrib><description>Cluster validation methods provide measures to evaluate the quality of a clustering partition on a given data set, and to determine the correct number of clusters. Recently, a new set of validation techniques based on the clusters' negentropy has been introduced. Negentropy-based cluster validation favors data partitions into compact clusters which are not strongly overlapped. Its evaluation is quite simple and it has been shown to perform better than other state of the art techniques. However, like many other cluster validation approaches, it presents problems when validating partitions where some regions contain only a few data points. Different heuristics have been proposed to cope with this problem, which are systematically analyzed in this paper. We study the performance of AIC, BIC, and four negentropy-based validation approaches in synthetic clustering problems of increasing dimensionality, with unbalanced clusters and different degree of overlapping. Our results suggest that negentropy-based validation techniques outperform AIC and BIC when the ratio of the number of points to the dimension is not high, which is a very common situation in most real applications.</description><identifier>ISSN: 0925-2312</identifier><identifier>EISSN: 1872-8286</identifier><identifier>DOI: 10.1016/j.neucom.2012.09.044</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Applied sciences ; Cluster validation ; Clustering ; Clusters ; Computer science; control theory; systems ; Data points ; Data processing. List processing. Character string processing ; Exact sciences and technology ; Heuristic ; Memory organisation. Data processing ; Model selection ; Partitions ; Software ; State of the art</subject><ispartof>Neurocomputing (Amsterdam), 2014-01, Vol.123, p.33-39</ispartof><rights>2013 Elsevier B.V.</rights><rights>2015 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c369t-822fc2f22244487102c9eb2aad62cd039ddfcb6b58f45a243af74dc87f5835483</citedby><cites>FETCH-LOGICAL-c369t-822fc2f22244487102c9eb2aad62cd039ddfcb6b58f45a243af74dc87f5835483</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>309,310,314,780,784,789,790,23930,23931,25140,27924,27925</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=28282851$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>LAGO-FERNANDEZ, Luis F</creatorcontrib><creatorcontrib>ARAGON, Jesús</creatorcontrib><creatorcontrib>MARTINEZ-MUNOZ, Gonzalo</creatorcontrib><creatorcontrib>GONZALEZ, Ana M</creatorcontrib><creatorcontrib>SANCHEZ-MONTANES, Manuel</creatorcontrib><title>Cluster validation in problems with increasing dimensionality and unbalanced clusters</title><title>Neurocomputing (Amsterdam)</title><description>Cluster validation methods provide measures to evaluate the quality of a clustering partition on a given data set, and to determine the correct number of clusters. Recently, a new set of validation techniques based on the clusters' negentropy has been introduced. Negentropy-based cluster validation favors data partitions into compact clusters which are not strongly overlapped. Its evaluation is quite simple and it has been shown to perform better than other state of the art techniques. However, like many other cluster validation approaches, it presents problems when validating partitions where some regions contain only a few data points. Different heuristics have been proposed to cope with this problem, which are systematically analyzed in this paper. We study the performance of AIC, BIC, and four negentropy-based validation approaches in synthetic clustering problems of increasing dimensionality, with unbalanced clusters and different degree of overlapping. Our results suggest that negentropy-based validation techniques outperform AIC and BIC when the ratio of the number of points to the dimension is not high, which is a very common situation in most real applications.</description><subject>Applied sciences</subject><subject>Cluster validation</subject><subject>Clustering</subject><subject>Clusters</subject><subject>Computer science; control theory; systems</subject><subject>Data points</subject><subject>Data processing. List processing. Character string processing</subject><subject>Exact sciences and technology</subject><subject>Heuristic</subject><subject>Memory organisation. Data processing</subject><subject>Model selection</subject><subject>Partitions</subject><subject>Software</subject><subject>State of the art</subject><issn>0925-2312</issn><issn>1872-8286</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LxDAQhoMouH78Aw-9CF62JtO0TS6CLH7Bghf3HNJ8aJY2XZN2Zf-9KV08Sg4D4ZmZdx6EbgjOCSbV_Tb3ZlR9lwMmkGOeY0pP0IKwGpYMWHWKFphDuYSCwDm6iHGLMakJ8AXarNoxDiZke9k6LQfX-8z5bBf6pjVdzH7c8JU-VDAyOv-ZadcZHxOV8OGQSa-z0TeylV4Znal5WLxCZ1a20Vwf6yXaPD99rF6X6_eXt9XjeqmKig8pG1gFFgAopawmGBQ3DUipK1AaF1xrq5qqKZmlpQRaSFtTrVhtS1aUlBWX6G6em_J-jyYOonNRmTbFMf0YBalqUhHC6zqhdEZV6GMMxopdcJ0MB0GwmCyKrZgtismiwFwki6nt9rhBRiVbG9KhLv71ApteSRL3MHMmnbt3JoionJmkuGDUIHTv_l_0C7E0i4c</recordid><startdate>20140110</startdate><enddate>20140110</enddate><creator>LAGO-FERNANDEZ, Luis F</creator><creator>ARAGON, Jesús</creator><creator>MARTINEZ-MUNOZ, Gonzalo</creator><creator>GONZALEZ, Ana M</creator><creator>SANCHEZ-MONTANES, Manuel</creator><general>Elsevier B.V</general><general>Elsevier</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20140110</creationdate><title>Cluster validation in problems with increasing dimensionality and unbalanced clusters</title><author>LAGO-FERNANDEZ, Luis F ; ARAGON, Jesús ; MARTINEZ-MUNOZ, Gonzalo ; GONZALEZ, Ana M ; SANCHEZ-MONTANES, Manuel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c369t-822fc2f22244487102c9eb2aad62cd039ddfcb6b58f45a243af74dc87f5835483</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Applied sciences</topic><topic>Cluster validation</topic><topic>Clustering</topic><topic>Clusters</topic><topic>Computer science; control theory; systems</topic><topic>Data points</topic><topic>Data processing. List processing. Character string processing</topic><topic>Exact sciences and technology</topic><topic>Heuristic</topic><topic>Memory organisation. Data processing</topic><topic>Model selection</topic><topic>Partitions</topic><topic>Software</topic><topic>State of the art</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>LAGO-FERNANDEZ, Luis F</creatorcontrib><creatorcontrib>ARAGON, Jesús</creatorcontrib><creatorcontrib>MARTINEZ-MUNOZ, Gonzalo</creatorcontrib><creatorcontrib>GONZALEZ, Ana M</creatorcontrib><creatorcontrib>SANCHEZ-MONTANES, Manuel</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Neurocomputing (Amsterdam)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>LAGO-FERNANDEZ, Luis F</au><au>ARAGON, Jesús</au><au>MARTINEZ-MUNOZ, Gonzalo</au><au>GONZALEZ, Ana M</au><au>SANCHEZ-MONTANES, Manuel</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Cluster validation in problems with increasing dimensionality and unbalanced clusters</atitle><jtitle>Neurocomputing (Amsterdam)</jtitle><date>2014-01-10</date><risdate>2014</risdate><volume>123</volume><spage>33</spage><epage>39</epage><pages>33-39</pages><issn>0925-2312</issn><eissn>1872-8286</eissn><abstract>Cluster validation methods provide measures to evaluate the quality of a clustering partition on a given data set, and to determine the correct number of clusters. Recently, a new set of validation techniques based on the clusters' negentropy has been introduced. Negentropy-based cluster validation favors data partitions into compact clusters which are not strongly overlapped. Its evaluation is quite simple and it has been shown to perform better than other state of the art techniques. However, like many other cluster validation approaches, it presents problems when validating partitions where some regions contain only a few data points. Different heuristics have been proposed to cope with this problem, which are systematically analyzed in this paper. We study the performance of AIC, BIC, and four negentropy-based validation approaches in synthetic clustering problems of increasing dimensionality, with unbalanced clusters and different degree of overlapping. Our results suggest that negentropy-based validation techniques outperform AIC and BIC when the ratio of the number of points to the dimension is not high, which is a very common situation in most real applications.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.neucom.2012.09.044</doi><tpages>7</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0925-2312 |
ispartof | Neurocomputing (Amsterdam), 2014-01, Vol.123, p.33-39 |
issn | 0925-2312 1872-8286 |
language | eng |
recordid | cdi_proquest_miscellaneous_1671611977 |
source | ScienceDirect Freedom Collection 2022-2024 |
subjects | Applied sciences Cluster validation Clustering Clusters Computer science control theory systems Data points Data processing. List processing. Character string processing Exact sciences and technology Heuristic Memory organisation. Data processing Model selection Partitions Software State of the art |
title | Cluster validation in problems with increasing dimensionality and unbalanced clusters |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T12%3A29%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Cluster%20validation%20in%20problems%20with%20increasing%20dimensionality%20and%20unbalanced%20clusters&rft.jtitle=Neurocomputing%20(Amsterdam)&rft.au=LAGO-FERNANDEZ,%20Luis%20F&rft.date=2014-01-10&rft.volume=123&rft.spage=33&rft.epage=39&rft.pages=33-39&rft.issn=0925-2312&rft.eissn=1872-8286&rft_id=info:doi/10.1016/j.neucom.2012.09.044&rft_dat=%3Cproquest_cross%3E1671611977%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c369t-822fc2f22244487102c9eb2aad62cd039ddfcb6b58f45a243af74dc87f5835483%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1671611977&rft_id=info:pmid/&rfr_iscdi=true |