Loading…
Set-based integer-coded fuzzy granular evolutionary algorithms for high-dimensional feature selection
Feature selection plays a pivotal role in handling today’s high-dimensional databases by keeping only the most valuable features, leading to less computation, improved performance, and higher transparency in decision-making processes. Despite the considerable advances in combinatorial optimization,...
Saved in:
Published in: | Applied soft computing 2023-07, Vol.142, p.110240, Article 110240 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Feature selection plays a pivotal role in handling today’s high-dimensional databases by keeping only the most valuable features, leading to less computation, improved performance, and higher transparency in decision-making processes. Despite the considerable advances in combinatorial optimization, this data-preprocessing step is computationally NP-hard and continues to pose critical challenges, particularly for very high-dimensional (VHD) databases. Here, we propose integer coding and fuzzy granulation (FG) as an integral part of evolutionary wrapper-based feature selection. Based on this integer coding, we further propose crossover and mutation operators that employ set operations such as ‘union,’ ‘intersection,’ and ‘complement’ for higher transparency in their evolutionary explorative and exploitative search processes. In addition to its common use as a surrogate technique to avoid unnecessary computations by recognizing similarities, the fuzzy granulation concept also operates as a repulsive strategy that searches for dissimilarities in the elitist and population initialization routines to reach higher population diversity. An ablation study is implemented to discover the role of individual components of this multi-prong approach. The results are then compared on 22 benchmark problems, ranging from 64 to 138672 attributes, with nine competing methods. Superior performance is shown for the proposed approach in terms of accuracy (in 15 of 22 cases) and achieving a substantially smaller (as much as six times less) feature set with considerably less computational cost (by an average of 30 percent), particularly for VHD feature selection
•Very high-dimensional evolutionary feature selection uses integer coding.•Union, intersection, and complement operators perform crossover and mutation.•Fuzzy granulation acts as a diversity-preserving initialization and elitism strategies.•Fuzzy surrogate granulation is also employed to reduce fitness evaluations.•Up to 30 percent fewer computations for problems with more than 1000 features. |
---|---|
ISSN: | 1568-4946 1872-9681 |
DOI: | 10.1016/j.asoc.2023.110240 |