Loading…

Fast density peak clustering for large scale data based on kNN

Density Peak (DPeak) clustering algorithm is not applicable for large scale data, due to two quantities, i.e, ρ and δ, are both obtained by brute force algorithm with complexity O(n2). Thus, a simple but fast DPeak, namely FastDPeak,11https://github.com/XiaoLHu/FastDPeak?tdsourcetag=s_pcqq_aiomsg. i...

Full description

Saved in:
Bibliographic Details
Published in:Knowledge-based systems 2020-01, Vol.187, p.104824, Article 104824
Main Authors: Chen, Yewang, Hu, Xiaoliang, Fan, Wentao, Shen, Lianlian, Zhang, Zheng, Liu, Xin, Du, Jixiang, Li, Haibo, Chen, Yi, Li, Hailin
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Density Peak (DPeak) clustering algorithm is not applicable for large scale data, due to two quantities, i.e, ρ and δ, are both obtained by brute force algorithm with complexity O(n2). Thus, a simple but fast DPeak, namely FastDPeak,11https://github.com/XiaoLHu/FastDPeak?tdsourcetag=s_pcqq_aiomsg. is proposed, which runs in about O(nlog(n)) expected time in the intrinsic dimensionality. It replaces density with kNN-density, which is computed by fast kNN algorithm such as cover tree, yielding huge improvement for density computations. Based on kNN-density, local density peaks and non-local density peaks are identified, and a fast algorithm, which uses two different strategies to compute δ for them, is also proposed with complexity O(n). Experimental results show that FastDPeak is effective and outperforms other variants of DPeak.
ISSN:0950-7051
1872-7409
DOI:10.1016/j.knosys.2019.06.032