Loading…

K-Means: An Efficient Clustering Algorithm with Adaptive Decision Boundaries

Conventional k -means algorithms often face significant computational burdens and have a high dependence on the number of predefined clusters k . Therefore, this paper proposes the k ∗ -means algorithm, which incorporates the concept of the perceptron classification algorithm to transform the distan...

Full description

Saved in:

Bibliographic Details
Published in:	International journal of parallel programming 2025-02, Vol.53 (1), p.3, Article 3
Main Authors:	Long, Jianwu, Liu, Luping
Format:	Article
Language:	English
Subjects:	Adaptive algorithms Algorithms Classification Cluster analysis Clustering Computer Science Datasets Parallel processing Processor Architectures Software Engineering/Programming and Operating Systems Synthetic data Theory of Computation
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Conventional k -means algorithms often face significant computational burdens and have a high dependence on the number of predefined clusters k . Therefore, this paper proposes the k ∗ -means algorithm, which incorporates the concept of the perceptron classification algorithm to transform the distance-based clustering task into a classification problem, significantly improving clustering efficiency. Moreover, this paper combines the k ∗ -means algorithm with hierarchical clustering methods that can automatically identify the number of clusters. An initial clustering is performed using a large pre-set number of clusters with the k ∗ -means algorithm, followed by merging the sub-clusters through hierarchical clustering. Experimental results show that the proposed k ∗ -means method has significant advantages when handling large-scale datasets. It greatly reduces the number of distance calculations and performs better in terms of runtime compared to the latest accelerated k -means algorithms. And the k ∗ -means algorithm, when combined with hierarchical clustering, shows notable performance on both the four synthetic datasets and the four real datasets. Future work could explore leveraging parallelization techniques to further enhance its scalability and efficiency on even larger datasets.
ISSN:	0885-7458 1573-7640
DOI:	10.1007/s10766-024-00779-8