Loading…

Evaluating the impact of filter-based feature selection in intrusion detection systems

High dimensionality can lead to overfitting and affect the modeling power of classification algorithms, resulting an increase in false positive rate (FPR) and false negative rate (FNR). Therefore, feature selection is a critical issue to deal with by means of efficient techniques when developing Int...

Full description

Saved in:
Bibliographic Details
Published in:International journal of information security 2024-04, Vol.23 (2), p.759-785
Main Authors: Zouhri, Houssam, Idri, Ali, Ratnani, Ahmed
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:High dimensionality can lead to overfitting and affect the modeling power of classification algorithms, resulting an increase in false positive rate (FPR) and false negative rate (FNR). Therefore, feature selection is a critical issue to deal with by means of efficient techniques when developing Intrusion Detection Systems. This study seeks to assess and compare the impacts of five univariate filters (Relieff, Pearson Correlation, Mutual Information, ANOVA, and Chi2) with different selection thresholds, and three multivariate filters (Correlation-based feature subset selection, Double Input Symmetric Relevance and Consistency-based subset selection) on the performances of four classifiers (Multilayer Perceptron, Support Vector Machines, XGBoost and Random Forest) over CIC-IDS2017, CSE-CIC-IDS2018 and CIC-ToN-IoT intrusion detection datasets. We evaluate 228 variants of classifiers to determine the features that positively impact the classification efficiency of all the cyber-attack scenarios used. The obtained results show that using XGBoost and Random Forest trained with multivariate methods, such as CON and DISR, can effectively reduce the number of features without affecting the classification performance and detection rate, compared to other filtering methods.
ISSN:1615-5262
1615-5270
DOI:10.1007/s10207-023-00767-y