Loading…
Entropy-weighted medoid shift: An automated clustering algorithm for high-dimensional data
Unveiling the intrinsic structure within high-dimensional data presents a significant challenge, particularly when clusters manifest themselves in lower-dimensional subspaces rather than in the full feature space. This complexity is prevalent in real-world datasets, such as text documents and images...
Saved in:
Published in: | Applied soft computing 2025-01, Vol.169, Article 112347 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Unveiling the intrinsic structure within high-dimensional data presents a significant challenge, particularly when clusters manifest themselves in lower-dimensional subspaces rather than in the full feature space. This complexity is prevalent in real-world datasets, such as text documents and images, which often contain numerous noisy or sparse features. Traditional clustering methods often overlook these latent subspace structures. This paper introduces a novel subspace-based clustering algorithm designed explicitly to address this challenge. Building upon the robust medoid shift framework, we integrate a dimensionality reduction scheme that dynamically projects data onto evolving subspaces determined through entropy-constrained optimization. This approach effectively filters irrelevant information and identifies underlying clusters, optimizing subspace representation while avoiding trivial solutions. Unlike existing methods, our algorithm ensures convergence without necessitating stopping criteria, thereby enabling efficient processing of large datasets. We validate the efficacy of our approach through extensive experiments on synthetic and real-world datasets, demonstrating substantial performance enhancements over state-of-the-art techniques. By explicitly uncovering the underlying subspace structures, our method opens new avenues for effective high-dimensional data clustering and offers valuable insights into complex data environments.
[Display omitted]
•Novel mode-seeking algorithm for clustering high-dimensional datasets in projected subspaces.•Subspace-determining scheme enhances accuracy of cluster identification.•Guaranteed convergence without stopping criteria in the proposed algorithm.•Experimental studies show effectiveness against state-of-the-art algorithms in high-dimensional environments with high noise-to-signal ratio. |
---|---|
ISSN: | 1568-4946 |
DOI: | 10.1016/j.asoc.2024.112347 |