Loading…

Minimum spanning tree‐based cluster analysis: A new algorithm for determining inconsistent edges

In recent years, graph‐based data clustering algorithms have become popular as they perform connectivity‐based rather than centroid‐based partitioning. Methods related to minimum spanning tree (MST)‐based data clustering are types of graph‐based algorithms that can recognize arbitrary shapes of clus...

Full description

Saved in:
Bibliographic Details
Published in:Concurrency and computation 2022-04, Vol.34 (9), p.n/a
Main Authors: Şaar, Fadi, Topcu, Ahmet E.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In recent years, graph‐based data clustering algorithms have become popular as they perform connectivity‐based rather than centroid‐based partitioning. Methods related to minimum spanning tree (MST)‐based data clustering are types of graph‐based algorithms that can recognize arbitrary shapes of clusters by eliminating inconsistent edges from MST graphs. In all MST‐based data clustering algorithms, definition of inconsistent edges is the main problem that needs to be addressed. The longest edges in MST graphs are considered as inconsistent edges under ideal conditions. Nevertheless, outliers often exist in real‐world tasks, which makes the longest edges inaccurate cluster separation indicators. In this paper, we propose a new data clustering algorithm using MST and a critical distance method. The proposed algorithm solves the main issue of MST‐based data clustering, namely identifying and removing inconsistent edges to obtain clusters even in the event that the dataset contains some outliers. It begins by constructing the MST over a given weighted graph based on Euclidean distance and then splits up the graph into clusters by eliminating inconsistent edges using critical distance as a threshold. Integration of the advantages of both MST and critical distance methodology to obtain optimal clusters is the main contribution of this work. The conducted experimental analysis and results using different datasets prove that our proposed clustering algorithm yields better overall performance compared with the most common data clustering algorithms. Taking the Liver and Tumor datasets as an example, the proposed algorithm outperforms all other clustering algorithms with clustering accuracy equal to 0.579 and 0.660, respectively.
ISSN:1532-0626
1532-0634
DOI:10.1002/cpe.6717