Loading…

Cluster validation in problems with increasing dimensionality and unbalanced clusters

Cluster validation methods provide measures to evaluate the quality of a clustering partition on a given data set, and to determine the correct number of clusters. Recently, a new set of validation techniques based on the clusters' negentropy has been introduced. Negentropy-based cluster valida...

Full description

Saved in:
Bibliographic Details
Published in:Neurocomputing (Amsterdam) 2014-01, Vol.123, p.33-39
Main Authors: LAGO-FERNANDEZ, Luis F, ARAGON, Jesús, MARTINEZ-MUNOZ, Gonzalo, GONZALEZ, Ana M, SANCHEZ-MONTANES, Manuel
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c369t-822fc2f22244487102c9eb2aad62cd039ddfcb6b58f45a243af74dc87f5835483
cites cdi_FETCH-LOGICAL-c369t-822fc2f22244487102c9eb2aad62cd039ddfcb6b58f45a243af74dc87f5835483
container_end_page 39
container_issue
container_start_page 33
container_title Neurocomputing (Amsterdam)
container_volume 123
creator LAGO-FERNANDEZ, Luis F
ARAGON, Jesús
MARTINEZ-MUNOZ, Gonzalo
GONZALEZ, Ana M
SANCHEZ-MONTANES, Manuel
description Cluster validation methods provide measures to evaluate the quality of a clustering partition on a given data set, and to determine the correct number of clusters. Recently, a new set of validation techniques based on the clusters' negentropy has been introduced. Negentropy-based cluster validation favors data partitions into compact clusters which are not strongly overlapped. Its evaluation is quite simple and it has been shown to perform better than other state of the art techniques. However, like many other cluster validation approaches, it presents problems when validating partitions where some regions contain only a few data points. Different heuristics have been proposed to cope with this problem, which are systematically analyzed in this paper. We study the performance of AIC, BIC, and four negentropy-based validation approaches in synthetic clustering problems of increasing dimensionality, with unbalanced clusters and different degree of overlapping. Our results suggest that negentropy-based validation techniques outperform AIC and BIC when the ratio of the number of points to the dimension is not high, which is a very common situation in most real applications.
doi_str_mv 10.1016/j.neucom.2012.09.044
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1671611977</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0925231213003603</els_id><sourcerecordid>1671611977</sourcerecordid><originalsourceid>FETCH-LOGICAL-c369t-822fc2f22244487102c9eb2aad62cd039ddfcb6b58f45a243af74dc87f5835483</originalsourceid><addsrcrecordid>eNp9kE1LxDAQhoMouH78Aw-9CF62JtO0TS6CLH7Bghf3HNJ8aJY2XZN2Zf-9KV08Sg4D4ZmZdx6EbgjOCSbV_Tb3ZlR9lwMmkGOeY0pP0IKwGpYMWHWKFphDuYSCwDm6iHGLMakJ8AXarNoxDiZke9k6LQfX-8z5bBf6pjVdzH7c8JU-VDAyOv-ZadcZHxOV8OGQSa-z0TeylV4Znal5WLxCZ1a20Vwf6yXaPD99rF6X6_eXt9XjeqmKig8pG1gFFgAopawmGBQ3DUipK1AaF1xrq5qqKZmlpQRaSFtTrVhtS1aUlBWX6G6em_J-jyYOonNRmTbFMf0YBalqUhHC6zqhdEZV6GMMxopdcJ0MB0GwmCyKrZgtismiwFwki6nt9rhBRiVbG9KhLv71ApteSRL3MHMmnbt3JoionJmkuGDUIHTv_l_0C7E0i4c</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1671611977</pqid></control><display><type>article</type><title>Cluster validation in problems with increasing dimensionality and unbalanced clusters</title><source>ScienceDirect Freedom Collection 2022-2024</source><creator>LAGO-FERNANDEZ, Luis F ; ARAGON, Jesús ; MARTINEZ-MUNOZ, Gonzalo ; GONZALEZ, Ana M ; SANCHEZ-MONTANES, Manuel</creator><creatorcontrib>LAGO-FERNANDEZ, Luis F ; ARAGON, Jesús ; MARTINEZ-MUNOZ, Gonzalo ; GONZALEZ, Ana M ; SANCHEZ-MONTANES, Manuel</creatorcontrib><description>Cluster validation methods provide measures to evaluate the quality of a clustering partition on a given data set, and to determine the correct number of clusters. Recently, a new set of validation techniques based on the clusters' negentropy has been introduced. Negentropy-based cluster validation favors data partitions into compact clusters which are not strongly overlapped. Its evaluation is quite simple and it has been shown to perform better than other state of the art techniques. However, like many other cluster validation approaches, it presents problems when validating partitions where some regions contain only a few data points. Different heuristics have been proposed to cope with this problem, which are systematically analyzed in this paper. We study the performance of AIC, BIC, and four negentropy-based validation approaches in synthetic clustering problems of increasing dimensionality, with unbalanced clusters and different degree of overlapping. Our results suggest that negentropy-based validation techniques outperform AIC and BIC when the ratio of the number of points to the dimension is not high, which is a very common situation in most real applications.</description><identifier>ISSN: 0925-2312</identifier><identifier>EISSN: 1872-8286</identifier><identifier>DOI: 10.1016/j.neucom.2012.09.044</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Applied sciences ; Cluster validation ; Clustering ; Clusters ; Computer science; control theory; systems ; Data points ; Data processing. List processing. Character string processing ; Exact sciences and technology ; Heuristic ; Memory organisation. Data processing ; Model selection ; Partitions ; Software ; State of the art</subject><ispartof>Neurocomputing (Amsterdam), 2014-01, Vol.123, p.33-39</ispartof><rights>2013 Elsevier B.V.</rights><rights>2015 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c369t-822fc2f22244487102c9eb2aad62cd039ddfcb6b58f45a243af74dc87f5835483</citedby><cites>FETCH-LOGICAL-c369t-822fc2f22244487102c9eb2aad62cd039ddfcb6b58f45a243af74dc87f5835483</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>309,310,314,780,784,789,790,23930,23931,25140,27924,27925</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=28282851$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>LAGO-FERNANDEZ, Luis F</creatorcontrib><creatorcontrib>ARAGON, Jesús</creatorcontrib><creatorcontrib>MARTINEZ-MUNOZ, Gonzalo</creatorcontrib><creatorcontrib>GONZALEZ, Ana M</creatorcontrib><creatorcontrib>SANCHEZ-MONTANES, Manuel</creatorcontrib><title>Cluster validation in problems with increasing dimensionality and unbalanced clusters</title><title>Neurocomputing (Amsterdam)</title><description>Cluster validation methods provide measures to evaluate the quality of a clustering partition on a given data set, and to determine the correct number of clusters. Recently, a new set of validation techniques based on the clusters' negentropy has been introduced. Negentropy-based cluster validation favors data partitions into compact clusters which are not strongly overlapped. Its evaluation is quite simple and it has been shown to perform better than other state of the art techniques. However, like many other cluster validation approaches, it presents problems when validating partitions where some regions contain only a few data points. Different heuristics have been proposed to cope with this problem, which are systematically analyzed in this paper. We study the performance of AIC, BIC, and four negentropy-based validation approaches in synthetic clustering problems of increasing dimensionality, with unbalanced clusters and different degree of overlapping. Our results suggest that negentropy-based validation techniques outperform AIC and BIC when the ratio of the number of points to the dimension is not high, which is a very common situation in most real applications.</description><subject>Applied sciences</subject><subject>Cluster validation</subject><subject>Clustering</subject><subject>Clusters</subject><subject>Computer science; control theory; systems</subject><subject>Data points</subject><subject>Data processing. List processing. Character string processing</subject><subject>Exact sciences and technology</subject><subject>Heuristic</subject><subject>Memory organisation. Data processing</subject><subject>Model selection</subject><subject>Partitions</subject><subject>Software</subject><subject>State of the art</subject><issn>0925-2312</issn><issn>1872-8286</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LxDAQhoMouH78Aw-9CF62JtO0TS6CLH7Bghf3HNJ8aJY2XZN2Zf-9KV08Sg4D4ZmZdx6EbgjOCSbV_Tb3ZlR9lwMmkGOeY0pP0IKwGpYMWHWKFphDuYSCwDm6iHGLMakJ8AXarNoxDiZke9k6LQfX-8z5bBf6pjVdzH7c8JU-VDAyOv-ZadcZHxOV8OGQSa-z0TeylV4Znal5WLxCZ1a20Vwf6yXaPD99rF6X6_eXt9XjeqmKig8pG1gFFgAopawmGBQ3DUipK1AaF1xrq5qqKZmlpQRaSFtTrVhtS1aUlBWX6G6em_J-jyYOonNRmTbFMf0YBalqUhHC6zqhdEZV6GMMxopdcJ0MB0GwmCyKrZgtismiwFwki6nt9rhBRiVbG9KhLv71ApteSRL3MHMmnbt3JoionJmkuGDUIHTv_l_0C7E0i4c</recordid><startdate>20140110</startdate><enddate>20140110</enddate><creator>LAGO-FERNANDEZ, Luis F</creator><creator>ARAGON, Jesús</creator><creator>MARTINEZ-MUNOZ, Gonzalo</creator><creator>GONZALEZ, Ana M</creator><creator>SANCHEZ-MONTANES, Manuel</creator><general>Elsevier B.V</general><general>Elsevier</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20140110</creationdate><title>Cluster validation in problems with increasing dimensionality and unbalanced clusters</title><author>LAGO-FERNANDEZ, Luis F ; ARAGON, Jesús ; MARTINEZ-MUNOZ, Gonzalo ; GONZALEZ, Ana M ; SANCHEZ-MONTANES, Manuel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c369t-822fc2f22244487102c9eb2aad62cd039ddfcb6b58f45a243af74dc87f5835483</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Applied sciences</topic><topic>Cluster validation</topic><topic>Clustering</topic><topic>Clusters</topic><topic>Computer science; control theory; systems</topic><topic>Data points</topic><topic>Data processing. List processing. Character string processing</topic><topic>Exact sciences and technology</topic><topic>Heuristic</topic><topic>Memory organisation. Data processing</topic><topic>Model selection</topic><topic>Partitions</topic><topic>Software</topic><topic>State of the art</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>LAGO-FERNANDEZ, Luis F</creatorcontrib><creatorcontrib>ARAGON, Jesús</creatorcontrib><creatorcontrib>MARTINEZ-MUNOZ, Gonzalo</creatorcontrib><creatorcontrib>GONZALEZ, Ana M</creatorcontrib><creatorcontrib>SANCHEZ-MONTANES, Manuel</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Neurocomputing (Amsterdam)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>LAGO-FERNANDEZ, Luis F</au><au>ARAGON, Jesús</au><au>MARTINEZ-MUNOZ, Gonzalo</au><au>GONZALEZ, Ana M</au><au>SANCHEZ-MONTANES, Manuel</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Cluster validation in problems with increasing dimensionality and unbalanced clusters</atitle><jtitle>Neurocomputing (Amsterdam)</jtitle><date>2014-01-10</date><risdate>2014</risdate><volume>123</volume><spage>33</spage><epage>39</epage><pages>33-39</pages><issn>0925-2312</issn><eissn>1872-8286</eissn><abstract>Cluster validation methods provide measures to evaluate the quality of a clustering partition on a given data set, and to determine the correct number of clusters. Recently, a new set of validation techniques based on the clusters' negentropy has been introduced. Negentropy-based cluster validation favors data partitions into compact clusters which are not strongly overlapped. Its evaluation is quite simple and it has been shown to perform better than other state of the art techniques. However, like many other cluster validation approaches, it presents problems when validating partitions where some regions contain only a few data points. Different heuristics have been proposed to cope with this problem, which are systematically analyzed in this paper. We study the performance of AIC, BIC, and four negentropy-based validation approaches in synthetic clustering problems of increasing dimensionality, with unbalanced clusters and different degree of overlapping. Our results suggest that negentropy-based validation techniques outperform AIC and BIC when the ratio of the number of points to the dimension is not high, which is a very common situation in most real applications.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.neucom.2012.09.044</doi><tpages>7</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0925-2312
ispartof Neurocomputing (Amsterdam), 2014-01, Vol.123, p.33-39
issn 0925-2312
1872-8286
language eng
recordid cdi_proquest_miscellaneous_1671611977
source ScienceDirect Freedom Collection 2022-2024
subjects Applied sciences
Cluster validation
Clustering
Clusters
Computer science
control theory
systems
Data points
Data processing. List processing. Character string processing
Exact sciences and technology
Heuristic
Memory organisation. Data processing
Model selection
Partitions
Software
State of the art
title Cluster validation in problems with increasing dimensionality and unbalanced clusters
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T12%3A29%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Cluster%20validation%20in%20problems%20with%20increasing%20dimensionality%20and%20unbalanced%20clusters&rft.jtitle=Neurocomputing%20(Amsterdam)&rft.au=LAGO-FERNANDEZ,%20Luis%20F&rft.date=2014-01-10&rft.volume=123&rft.spage=33&rft.epage=39&rft.pages=33-39&rft.issn=0925-2312&rft.eissn=1872-8286&rft_id=info:doi/10.1016/j.neucom.2012.09.044&rft_dat=%3Cproquest_cross%3E1671611977%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c369t-822fc2f22244487102c9eb2aad62cd039ddfcb6b58f45a243af74dc87f5835483%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1671611977&rft_id=info:pmid/&rfr_iscdi=true