Loading…

A method of two-stage clustering learning based on improved DBSCAN and density peak algorithm

Density peak (DP) and density-based spatial clustering of applications with noise (DBSCAN) are the representative clustering algorithms on the basis of density in unsupervised learning. They are capable of clustering data of arbitrary shape as well as identifying noise samples in a potential data se...

Full description

Saved in:
Bibliographic Details
Published in:Computer communications 2021-02, Vol.167, p.75-84
Main Authors: Li, Mingyang, Bi, Xinhua, Wang, Limin, Han, Xuming
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Density peak (DP) and density-based spatial clustering of applications with noise (DBSCAN) are the representative clustering algorithms on the basis of density in unsupervised learning. They are capable of clustering data of arbitrary shape as well as identifying noise samples in a potential data set. Notwithstanding, DP algorithm depends on the decision graph when selecting the centers, it is difficult for users without priori knowledge to automatically as well as accurately identify cluster centers. The clustering performance exhibited by DBSCAN algorithm presents a strong sensitivity to parameter setting regarding Eps and MinPts. For dealing with afore-mentioned issues, we propose a new two-stage clustering method based on improved DBSCAN and DP algorithm (TSCM), which first use an improved DBSCAN algorithm based on bat optimization to generate initial clusters. Specifically, the improved DBSCAN takes a well-known internal clustering validation index without labels called Silhouette as fitness function to control the process of parameters determination by bat optimization. The cluster centers in decision graph are automatically selected according to the initial clusters. The final clusters are obtained by DP with the determined cluster centers. As found in the experiments, relative to DP and DBSCAN, TSCM can effectively overcome the manual intervention of cluster center selection in DP and parameters setting in DBSCAN. The clustering performance is significantly improved.
ISSN:0140-3664
1873-703X
DOI:10.1016/j.comcom.2020.12.019