Loading…

An empirical assessment of ensemble methods and traditional machine learning techniques for web-based attack detection in industry 5.0

Cybersecurity attacks that target software have become profitable and popular targets for cybercriminals who consciously take advantage of web-based vulnerabilities and execute attacks that might jeopardize essential industry 5.0 features. Several machine learning-based techniques have been develope...

Full description

Saved in:
Bibliographic Details
Published in:Journal of King Saud University. Computer and information sciences 2023-03, Vol.35 (3), p.103-119
Main Authors: Chakir, Oumaima, Rehaimi, Abdeslam, Sadqi, Yassine, Abdellaoui Alaoui, El Arbi, Krichen, Moez, Gaba, Gurjot Singh, Gurtov, Andrei
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Cybersecurity attacks that target software have become profitable and popular targets for cybercriminals who consciously take advantage of web-based vulnerabilities and execute attacks that might jeopardize essential industry 5.0 features. Several machine learning-based techniques have been developed in the literature to identify these types of assaults. In contrast to single classifiers, ensemble methods have not been evaluated empirically. To the best of our knowledge, this work is the first empirical evaluation of both homogeneous and heterogeneous ensemble approaches compared to single classifiers for web-based attack detection in industry 5.0, utilizing two of the most realistic public web-based attack datasets. The authors divided the experiment into three main phases: In the first phase, they evaluated the performance of five well-established supervised machine learning (ML) classifiers. In the second phase, they constructed a heterogeneous ensemble of the three best-performing ML algorithms using max voting and stacking methods. In the third phase, they used four well-known homogeneous ensembles to evaluate the performance of the bagging and boosting method. The results based on the ECML/PKDD 2007 and CSIC HTTP 2010 datasets revealed that bagging, particularly Random Forest, outperformed single classifiers in terms of accuracy, precision, F-value, FPR, and area of the ROC curve with values of 99.597%, 98.274%, 99.129%, 0.523%, 100 and 99.867%, 99.867%, 99.867%, 0.267%, 100, respectively. In contrast, single classifiers performed better than boosting and stacking. However, in terms of FPR, the boosting exceeded single classifiers. Max voting is appropriate when accuracy, precision, and FPR are the primary concerns, whereas single classifiers can be employed when recall, FNR, training, and prediction times are critical elements. In terms of training time, ensemble approaches are more likely to be affected by data volume than single classifiers. The paper’s findings will help security researchers and practitioners identify the most efficient learning techniques for securing web applications.
ISSN:1319-1578
2213-1248
DOI:10.1016/j.jksuci.2023.02.009