Loading…

Generalized Possibilistic Fuzzy C-Means with novel cluster validity indices for clustering noisy data

[Display omitted] •A fuzzy clustering algorithm is presented for noisy data.•It works with covariance norm in contrast to PFCM.•Its error is about 80% less than error of PFCM.•It uses a function of distance instead of the distance itself. A generalized form of Possibilistic Fuzzy C-Means (PFCM) algo...

Full description

Saved in:

Bibliographic Details
Published in:	Applied soft computing 2017-04, Vol.53, p.262-283
Main Authors:	Askari, S., Montazerin, N., Fazel Zarandi, M.H.
Format:	Article
Language:	English
Subjects:	Cluster Validity Index (CVI) Fuzzy C-Means (FCM) Fuzzy clustering Noise Possibilistic C-Means (PCM)
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	[Display omitted] •A fuzzy clustering algorithm is presented for noisy data.•It works with covariance norm in contrast to PFCM.•Its error is about 80% less than error of PFCM.•It uses a function of distance instead of the distance itself. A generalized form of Possibilistic Fuzzy C-Means (PFCM) algorithm (GPFCM) is presented for clustering noisy data. A function of distance is used instead of the distance itself to damp noise contributions. It is shown that when the data are highly noisy, GPFCM finds accurate cluster centers but FCM (Fuzzy C-Means), PCM (Possibilistic C-Means), and PFCM algorithms fail. FCM, PCM, and PFCM yield inaccurate cluster centers when clusters are not of the same size or covariance norm is used, whereas GPFCM performs well for both of the cases even when the data are noisy. It is shown that generalized forms of FCM and PCM (GFCM and GPCM) are also more accurate than FCM and PCM. A measure is defined to evaluate performance of the clustering algorithms. It shows that average error of GPFCM and its simplified forms are about 80% smaller than those of FCM, PCM, and PFCM. However, GPFCM demands higher computational costs due to nonlinear updating equations. Three cluster validity indices are introduced to determine number of clusters in clean and noisy datasets. One of them considers compactness of the clusters; the other considers separation of the clusters, and the third one considers both separation and compactness. Performance of these indices is confirmed to be satisfactory using various examples of noisy datasets.
ISSN:	1568-4946 1872-9681
DOI:	10.1016/j.asoc.2016.12.049