Loading…

Subspace Clustering of Categorical and Numerical Data With an Unknown Number of Clusters

In clustering analysis, data attributes may have different contributions to the detection of various clusters. To solve this problem, the subspace clustering technique has been developed, which aims at grouping the data objects into clusters based on the subsets of attributes rather than the entire...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transaction on neural networks and learning systems 2018-08, Vol.29 (8), p.3308-3325
Main Authors: Jia, Hong, Cheung, Yiu-Ming
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In clustering analysis, data attributes may have different contributions to the detection of various clusters. To solve this problem, the subspace clustering technique has been developed, which aims at grouping the data objects into clusters based on the subsets of attributes rather than the entire data space. However, the most existing subspace clustering methods are only applicable to either numerical or categorical data, but not both. This paper, therefore, studies the soft subspace clustering of data with both of the numerical and categorical attributes (also simply called mixed data for short). Specifically, an attribute-weighted clustering model based on the definition of object-cluster similarity is presented. Accordingly, a unified weighting scheme for the numerical and categorical attributes is proposed, which quantifies the attribute-to-cluster contribution by taking into account both of intercluster difference and intracluster similarity. Moreover, a rival penalized competitive learning mechanism is further introduced into the proposed soft subspace clustering algorithm so that the subspace cluster structure as well as the most appropriate number of clusters can be learned simultaneously in a single learning paradigm. In addition, an initialization-oriented method is also presented, which can effectively improve the stability and accuracy of k -means-type clustering methods on numerical, categorical, and mixed data. The experimental results on different benchmark data sets show the efficacy of the proposed approach.
ISSN:2162-237X
2162-2388
DOI:10.1109/TNNLS.2017.2728138