Loading…
Fast spectral clustering learning with hierarchical bipartite graph for large-scale data
•A spectral clustering based on hierarchical bipartite graph approach is proposed.•The approach could solve the large scale data better.•The approach has better performance in clustering accuracy and time cost.•The approaches can avoid adjusting the heat-kernel parameter σ.•The approach could quickl...
Saved in:
Published in: | Pattern recognition letters 2020-02, Vol.130, p.345-352 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •A spectral clustering based on hierarchical bipartite graph approach is proposed.•The approach could solve the large scale data better.•The approach has better performance in clustering accuracy and time cost.•The approaches can avoid adjusting the heat-kernel parameter σ.•The approach could quickly deal with out-of-sample problem.
Spectral clustering (SC) is drawing more and more attention due to its effectiveness in unsupervised learning. However, all of these methods still have limitations. First, the method is not suitable for large-scale problems due to its high computational complexity. Second, the neighborhood weighted graph is constructed by the Gaussian kernel, meaning that more work is required to tune the heat-kernel parameter. In order to overcome these issues, we propose a novel spectral clustering based on hierarchical bipartite graph (SCHBG) approach by exploring multiple-layer anchors with a pyramid-style structure. First, the proposed algorithm constructs a hierarchical bipartite graph, and then performs spectral analysis on the graph. As a result, the computational complexity can be largely reduced. Furthermore, we adopt a parameter-free yet effective neighbor assignment strategy to construct the similarity matrix, which avoids the need to tune the heat-kernel parameter. Finally, the algorithm is able to deal with the out-of-sample problem for large-scale data and its computational complexity is significantly reduced. Experiments demonstrate the efficiency and effectiveness of the proposed SCHBG algorithm. Results show that the SCHBG approach can achieve good clustering accuracy (76%) on an 8-million datasets. Furthermore, owing to the use of the bipartite graph, the algorithm can reduce the time cost for out-of-sample situations with almost the same clustering accuracy as for large sizes of data. |
---|---|
ISSN: | 0167-8655 1872-7344 |
DOI: | 10.1016/j.patrec.2018.06.024 |