Loading…

Designing an efficient parallel spectral clustering algorithm on multi-core processors in Julia

Spectral clustering is widely used in data mining, machine learning and other fields. It can identify the arbitrary shape of a sample space and converge to the global optimal solution. Compared with the traditional k-means algorithm, the spectral clustering algorithm has stronger adaptability to dat...

Full description

Saved in:
Bibliographic Details
Published in:Journal of parallel and distributed computing 2020-04, Vol.138, p.211-221
Main Authors: Huo, Zenan, Mei, Gang, Casolla, Giampaolo, Giampaolo, Fabio
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Spectral clustering is widely used in data mining, machine learning and other fields. It can identify the arbitrary shape of a sample space and converge to the global optimal solution. Compared with the traditional k-means algorithm, the spectral clustering algorithm has stronger adaptability to data and better clustering results. However, the computation of the algorithm is quite expensive. In this paper, an efficient parallel spectral clustering algorithm on multi-core processors in the Julia language is proposed, and we refer to it as juPSC. The Julia language is a high-performance, open-source programming language. The juPSC is composed of three procedures: (1) calculating the affinity matrix, (2) calculating the eigenvectors, and (3) conducting k-means clustering. Procedures (1) and (3) are computed by the efficient parallel algorithm, and the COO format is used to compress the affinity matrix. Two groups of experiments are conducted to verify the accuracy and efficiency of the juPSC. Experimental results indicate that (1) the juPSC achieves speedups of approximately 14×∼18× on a 24-core CPU and that (2) the serial version of the juPSC is faster than the Python version of scikit-learn. Moreover, the structure and functions of the juPSC are designed considering modularity, which is convenient for combination and further optimization with other parallel computing platforms. •A Julia-based parallel algorithm of the spectral clustering is designed.•The parallel algorithm implements spectral clustering in the form of compression.•The parallel algorithm is designed considering modularity.•The parallel algorithm achieves speedups of approximately 18× on a 24-core CPU.
ISSN:0743-7315
1096-0848
DOI:10.1016/j.jpdc.2020.01.003