Loading…

Locating Multiple Equivalent Feature Subsets in Feature Selection for Imbalanced Classification

Feature selection can be used to solve imbalanced classification problems encountered in big data projects. There often exist multiple feature subsets achieving the same accuracy. These subsets tend to exhibit different acquisition difficulty and reliability, thus offering decision-makers with multi...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on knowledge and data engineering 2023-09, Vol.35 (9), p.1-14
Main Authors: Han, Shoufei, Zhu, Kun, Zhou, MengChu, Alhumade, Hesham, Abusorrah, Abdullah
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Feature selection can be used to solve imbalanced classification problems encountered in big data projects. There often exist multiple feature subsets achieving the same accuracy. These subsets tend to exhibit different acquisition difficulty and reliability, thus offering decision-makers with multiple choices if they can be well-identified. This work formulates feature selection as a Multimodal Multiobjective Problem (MMOP), where a point on Pareto front in objective space has multiple equivalent feature subsets in decision space. To seek more equivalent feature subsets, this work proposes a new multiobjective fireworks algorithm. It extends a latest single-objective fireworks algorithm to a multiobjective version such that it becomes suitable for solving MMOP. An adaptive strategy and special archive guidance are newly designed to improve its performance. A weighted extreme learning machine is chosen to classify datasets and return classification accuracy due to its fast learning speed. Experimental results show that the proposed algorithm outperforms its compared ones on 15 imbalanced classification datasets including 5 low-dimensional, 5 high-dimensional feature selection problems and 5 large-scale problems with larger imbalanced ratio, and its runtime is the least among them. Also, fault diagnosis in self-organizing cellular networks, as an important imbalance classification problem, is performed by the proposed algorithm and the results show that it can perform fault diagnosis well.
ISSN:1041-4347
1558-2191
DOI:10.1109/TKDE.2022.3222047