Loading…

Performance Analysis of K-Means Seeding Algorithms

K-Means is one of the most used cluster algorithms. However, because of its optimization process is based on a greedy iterated gradient descent, K-Means is sensitive to the initial set of centers. It has been proved that a bad initial set of centroids can reduce clusters' quality. Therefore, nu...

Full description

Saved in:
Bibliographic Details
Main Authors: Ortiz-Bejar, Jose, Tellez, Eric S., Graff, Mario, Ortiz-Bejar, Jesus, Jacobo, Jaime Cerda, Zamora-Mendez, Alejandro
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:K-Means is one of the most used cluster algorithms. However, because of its optimization process is based on a greedy iterated gradient descent, K-Means is sensitive to the initial set of centers. It has been proved that a bad initial set of centroids can reduce clusters' quality. Therefore, numerous initialization methods have been developed to prevent a lousy performance of K-Means clustering. Nonetheless, we may notice that all of these initialization methods are usually validated by using the Sum of Squared Errors (SSE), as quality measurement. In this study, we evaluate three state-of-the-art initialization methods with three different quality measures, i.e., SSE, the Silhouette Coefficient, and the Adjusted Rand Index. The analysis is carried out with seventeen benchmarks. We provide new insight into the performance of initialization methods that traditionally are left behind; our results describe the high correlation between different initialization methods and fitness functions. These results may help to optimize K-Means for other topological structures beyond those covered by optimizing SSE with low effort.
ISSN:2573-0770
DOI:10.1109/ROPEC48299.2019.9057044