Loading…
Parallel and accurate k‐means algorithm on CPU‐GPU architectures for spectral clustering
Summary k‐Means is a standard algorithm for clustering data. It constitutes generally the final step in a more complex chain of high‐quality spectral clustering. However, this chain suffers from lack of scalability when addressing large datasets. This can be overcome by applying also the k‐means alg...
Saved in:
Published in: | Concurrency and computation 2022-06, Vol.34 (14), p.n/a |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Summary
k‐Means is a standard algorithm for clustering data. It constitutes generally the final step in a more complex chain of high‐quality spectral clustering. However, this chain suffers from lack of scalability when addressing large datasets. This can be overcome by applying also the k‐means algorithm as a preprocessing task to reduce the input data instances. We propose parallel optimization techniques for the k‐means algorithm on CPU and GPU. Particularly we use a two‐step summation method with package processing to handle the effect of rounding errors that may occur during the phase of updating cluster centroids. Our experiments on synthetic and real‐world datasets containing millions of instances exhibit a speedup up to 7 for the k‐means iteration time on GPU versus 20/40 CPU threads using AVX units, and achieve double‐precision accuracy with single‐precision computations. |
---|---|
ISSN: | 1532-0626 1532-0634 |
DOI: | 10.1002/cpe.6621 |