Loading…

Ensemble feature selection: Homogeneous and heterogeneous approaches

•Over the last years, ensemble learning has been the focus of much attention.•We apply two different designs of ensemble learning on the feature selection process.•Homogeneous ensemble distributes the dataset on different nodes.•Heterogeneous ensemble combines the result of different feature selecti...

Full description

Saved in:
Bibliographic Details
Published in:Knowledge-based systems 2017-02, Vol.118, p.124-139
Main Authors: Seijo-Pardo, B., Porto-Díaz, I., Bolón-Canedo, V., Alonso-Betanzos, A.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Over the last years, ensemble learning has been the focus of much attention.•We apply two different designs of ensemble learning on the feature selection process.•Homogeneous ensemble distributes the dataset on different nodes.•Heterogeneous ensemble combines the result of different feature selection methods.•We reduce the training time and release the user to choose a feature selection method. In the last decade, ensemble learning has become a prolific discipline in pattern recognition, based on the assumption that the combination of the output of several models obtains better results than the output of any individual model. On the basis that the same principle can be applied to feature selection, we describe two approaches: (i) homogeneous, i.e., using the same feature selection method with different training data and distributing the dataset over several nodes; and (ii) heterogeneous, i.e., using different feature selection methods with the same training data. Both approaches are based on combining rankings of features that contain all the ordered features. The results of the base selectors are combined using different combination methods, also called aggregators, and a practical subset is selected according to several different threshold values (traditional values based on fixed percentages, and more novel automatic methods based on data complexity measures). In testing using a Support Vector Machine as a classifier, ensemble results for seven datasets demonstrate performance that is at least comparable and often better than the performance of individual feature selection methods.
ISSN:0950-7051
1872-7409
DOI:10.1016/j.knosys.2016.11.017