Loading…

Mutual equidistant-scattering criterion: A new index for crisp clustering

•A new non-parametric internal validity index is proposed for crisp clustering.•The index is based on within-cluster mutual equidistant-scattering.•It is proved effective on both (4) real-world and (26) synthetic benchmark data sets.•Seven other indexes are employed in comparisons to detect the numb...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications 2019-08, Vol.128, p.225-245
Main Authors: Flexa, Caio, Santos, Reginaldo, Gomes, Walisson, Sales, Claudomiro, Costa, João C.W.A.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•A new non-parametric internal validity index is proposed for crisp clustering.•The index is based on within-cluster mutual equidistant-scattering.•It is proved effective on both (4) real-world and (26) synthetic benchmark data sets.•Seven other indexes are employed in comparisons to detect the number of clusters.•The Friedman’s test corroborates that better results are yielded with this proposal. Clustering algorithms usually assume that the number K of clusters is known, although there is often no prior knowledge about the underlying set. Consequently, the significance of the defined groups needs to be validated. Cluster validity indexes are commonly used to perform the validation of clustering results. However, most of them are considered to be dependent on the number of data objects and often tend to ignore small and low-density groups. Furthermore, suboptimal clustering solutions are frequently selected when the clusters are in a certain degree of overlapping or low separation. Thus, we propose a new non-parametric internal validity index based on within-cluster mutual equidistant-scattering for crisp clustering. Eight different validity indexes were analysed to detect the number of clusters in a data set. Experiments on both synthetic and real-world data show the effectiveness and reliability of our approach to evaluate the hyperparameter K.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2019.03.027