Loading…
Performance Analysis of K-Means Seeding Algorithms
K-Means is one of the most used cluster algorithms. However, because of its optimization process is based on a greedy iterated gradient descent, K-Means is sensitive to the initial set of centers. It has been proved that a bad initial set of centroids can reduce clusters' quality. Therefore, nu...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | K-Means is one of the most used cluster algorithms. However, because of its optimization process is based on a greedy iterated gradient descent, K-Means is sensitive to the initial set of centers. It has been proved that a bad initial set of centroids can reduce clusters' quality. Therefore, numerous initialization methods have been developed to prevent a lousy performance of K-Means clustering. Nonetheless, we may notice that all of these initialization methods are usually validated by using the Sum of Squared Errors (SSE), as quality measurement. In this study, we evaluate three state-of-the-art initialization methods with three different quality measures, i.e., SSE, the Silhouette Coefficient, and the Adjusted Rand Index. The analysis is carried out with seventeen benchmarks. We provide new insight into the performance of initialization methods that traditionally are left behind; our results describe the high correlation between different initialization methods and fitness functions. These results may help to optimize K-Means for other topological structures beyond those covered by optimizing SSE with low effort. |
---|---|
ISSN: | 2573-0770 |
DOI: | 10.1109/ROPEC48299.2019.9057044 |