Loading…

An empirical study on the joint impact of feature selection and data resampling on imbalance classification

Many real-world datasets exhibit imbalanced distributions, in which the majority classes have sufficient samples, whereas the minority classes often have a very small number of samples. Data resampling has proven to be effective in alleviating such imbalanced settings, while feature selection is a c...

Full description

Saved in:
Bibliographic Details
Published in:Applied intelligence (Dordrecht, Netherlands) Netherlands), 2023-03, Vol.53 (5), p.5449-5461
Main Authors: Zhang, Chongsheng, Soda, Paolo, Bi, Jingjun, Fan, Gaojuan, Almpanidis, George, GarcĂ­a, Salvador, Ding, Weiping
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Many real-world datasets exhibit imbalanced distributions, in which the majority classes have sufficient samples, whereas the minority classes often have a very small number of samples. Data resampling has proven to be effective in alleviating such imbalanced settings, while feature selection is a commonly used technique for improving classification performance. However, the joint impact of feature selection and data resampling on two-class imbalance classification has rarely been addressed before. This work investigates the performance of two opposite imbalanced classification frameworks in which feature selection is applied before or after data resampling. We conduct a large-scale empirical study with a total of 9225 experiments on 52 publicly available datasets. The results show that both frameworks should be considered for finding the best performing imbalanced classification model. We also study the impact of classifiers, the ratio between the number of majority and minority samples (IR), and the ratio between the number of samples and features (SFR) on the performance of imbalance classification. Overall, this work provides a new reference value for researchers and practitioners in imbalance learning.
ISSN:0924-669X
1573-7497
1573-7497
DOI:10.1007/s10489-022-03772-1