Loading…

Fast spectral clustering learning with hierarchical bipartite graph for large-scale data

•A spectral clustering based on hierarchical bipartite graph approach is proposed.•The approach could solve the large scale data better.•The approach has better performance in clustering accuracy and time cost.•The approaches can avoid adjusting the heat-kernel parameter σ.•The approach could quickl...

Full description

Saved in:
Bibliographic Details
Published in:Pattern recognition letters 2020-02, Vol.130, p.345-352
Main Authors: Yang, Xiaojun, Yu, Weizhong, Wang, Rong, Zhang, Guohao, Nie, Feiping
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•A spectral clustering based on hierarchical bipartite graph approach is proposed.•The approach could solve the large scale data better.•The approach has better performance in clustering accuracy and time cost.•The approaches can avoid adjusting the heat-kernel parameter σ.•The approach could quickly deal with out-of-sample problem. Spectral clustering (SC) is drawing more and more attention due to its effectiveness in unsupervised learning. However, all of these methods still have limitations. First, the method is not suitable for large-scale problems due to its high computational complexity. Second, the neighborhood weighted graph is constructed by the Gaussian kernel, meaning that more work is required to tune the heat-kernel parameter. In order to overcome these issues, we propose a novel spectral clustering based on hierarchical bipartite graph (SCHBG) approach by exploring multiple-layer anchors with a pyramid-style structure. First, the proposed algorithm constructs a hierarchical bipartite graph, and then performs spectral analysis on the graph. As a result, the computational complexity can be largely reduced. Furthermore, we adopt a parameter-free yet effective neighbor assignment strategy to construct the similarity matrix, which avoids the need to tune the heat-kernel parameter. Finally, the algorithm is able to deal with the out-of-sample problem for large-scale data and its computational complexity is significantly reduced. Experiments demonstrate the efficiency and effectiveness of the proposed SCHBG algorithm. Results show that the SCHBG approach can achieve good clustering accuracy (76%) on an 8-million datasets. Furthermore, owing to the use of the bipartite graph, the algorithm can reduce the time cost for out-of-sample situations with almost the same clustering accuracy as for large sizes of data.
ISSN:0167-8655
1872-7344
DOI:10.1016/j.patrec.2018.06.024