Loading…
An evolving approach to the similarity-based modeling for online clustering in non-stationary environments
This paper proposes a novel evolving approach based on the Similarity-Based Modeling (SBM), a technique widely used in industrial applications of anomaly detection and multiclass classification. The proposed approach, which inherits from SBM, uses a simple model-matrix composed of historical points...
Saved in:
Published in: | Evolving systems 2025-02, Vol.16 (1), p.16 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper proposes a novel evolving approach based on the Similarity-Based Modeling (SBM), a technique widely used in industrial applications of anomaly detection and multiclass classification. The proposed approach, which inherits from SBM, uses a simple model-matrix composed of historical points to represent each cluster. Its inference procedure for a given input instance consists only of generating an estimate, considering each cluster, and then assigning the input to the most similar cluster according to a novel membership function that considers approximation error and data density. The main features of our approach include a simple and intuitive learning scheme, the ability to model clusters of any shape without using micro-cluster-like procedures, robustness to noisy data, and low computational effort. We evaluate the effectiveness of the proposed approach on fifteen datasets widely used in the literature, assessing its ability to deal with overlapping clusters, clusters with arbitrary shape, noisy data, and high dimensionality. Using Adjusted Rand Index (ARI) and Purity metrics, the proposed algorithm was compared with eight recent state-of-the-art algorithms, and the proposed method achieved the highest performance on most of the datasets. On the remaining datasets, it showed similar performance to other methods. Averaging over the fifteen datasets, our approach achieved an ARI value of 0.8872 and a Purity value of 0.9107. The most competitive method, considering ARI, achieved an average value of 0.6988, and considering Purity, achieved an average value of 0.9257. This shows the effectiveness of the proposed approach. |
---|---|
ISSN: | 1868-6478 1868-6486 |
DOI: | 10.1007/s12530-024-09646-w |