Loading…

The k-means-u algorithm: non-local jumps and greedy retries improve k-means++ clustering

We present a new clustering algorithm called k-means-u* which in many cases is able to significantly improve the clusterings found by k-means++, the current de-facto standard for clustering in Euclidean spaces. First we introduce the k-means-u algorithm which starts from a result of k-means++ and at...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2017-07
Main Author:	Fritzke, Bernd
Format:	Article
Language:	English
Subjects:	Algorithms Clustering Greedy algorithms Optimization
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We present a new clustering algorithm called k-means-u* which in many cases is able to significantly improve the clusterings found by k-means++, the current de-facto standard for clustering in Euclidean spaces. First we introduce the k-means-u algorithm which starts from a result of k-means++ and attempts to improve it with a sequence of non-local "jumps" alternated by runs of standard k-means. Each jump transfers the "least useful" center towards the center with the largest local error, offset by a small random vector. This is continued as long as the error decreases and often leads to an improved solution. Occasionally k-means-u terminates despite obvious remaining optimization possibilities. By allowing a limited number of retries for the last jump it is frequently possible to reach better local minima. The resulting algorithm is called k-means-u* and dominates k-means++ wrt. solution quality which is demonstrated empirically using various data sets. By construction the logarithmic quality bound established for k-means++ holds for k-means-u* as well.
ISSN:	2331-8422