Loading…

A new hybrid filter–wrapper feature selection method for clustering based on ranking

Feature selection is a common task in areas such as Pattern Recognition, Data Mining, and Machine Learning since it can help to improve prediction quality, reduce computation time and build more understandable models. Although feature selection for supervised classification has been widely studied,...

Full description

Saved in:
Bibliographic Details
Published in:Neurocomputing (Amsterdam) 2016-11, Vol.214, p.866-880
Main Authors: Solorio-Fernández, Saúl, Carrasco-Ochoa, J. Ariel, Martínez-Trinidad, José Fco
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Feature selection is a common task in areas such as Pattern Recognition, Data Mining, and Machine Learning since it can help to improve prediction quality, reduce computation time and build more understandable models. Although feature selection for supervised classification has been widely studied, feature selection in the absence of class labels, namely feature selection for clustering or unsupervised feature selection, has been less addressed. Most existing unsupervised feature selection approaches suffer from the called “Bias of Criterion Values to Dimension,” which arises when feature subsets with different cardinality are evaluated by an internal evaluation clustering criterion. In this paper, we introduce a new hybrid filter–wrapper method for clustering, which combines the spectral feature selection framework using the Laplacian Score ranking and a modified Calinski–Harabasz index. The proposed method in the filter stage sorts the features according to their relevance, while in the wrapper stage, through our modified Calinski–Harabasz index that takes into account the cardinality of the feature subsets under evaluation, evaluates the features considering them as a subset rather than individually by using two well-known selection strategies. Experiments on different datasets show that the proposed method alleviates the “Bias of Criterion Values to Dimension” and, identifies and selects more relevant features than those selected by other reported hybrid filter–wrapper feature selection methods for clustering. Additionally, we also contrast our results against other filter and wrapper methods of the state-of-the-art.
ISSN:0925-2312
1872-8286
DOI:10.1016/j.neucom.2016.07.026