Loading…

Parallel and accurate k‐means algorithm on CPU‐GPU architectures for spectral clustering

Summary k‐Means is a standard algorithm for clustering data. It constitutes generally the final step in a more complex chain of high‐quality spectral clustering. However, this chain suffers from lack of scalability when addressing large datasets. This can be overcome by applying also the k‐means alg...

Full description

Saved in:
Bibliographic Details
Published in:Concurrency and computation 2022-06, Vol.34 (14), p.n/a
Main Authors: He, Guanlin, Vialle, Stephane, Baboulin, Marc
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Summary k‐Means is a standard algorithm for clustering data. It constitutes generally the final step in a more complex chain of high‐quality spectral clustering. However, this chain suffers from lack of scalability when addressing large datasets. This can be overcome by applying also the k‐means algorithm as a preprocessing task to reduce the input data instances. We propose parallel optimization techniques for the k‐means algorithm on CPU and GPU. Particularly we use a two‐step summation method with package processing to handle the effect of rounding errors that may occur during the phase of updating cluster centroids. Our experiments on synthetic and real‐world datasets containing millions of instances exhibit a speedup up to 7 for the k‐means iteration time on GPU versus 20/40 CPU threads using AVX units, and achieve double‐precision accuracy with single‐precision computations.
ISSN:1532-0626
1532-0634
DOI:10.1002/cpe.6621