Loading…

Comparative analysis on imbalanced multi-class classification for malware samples using CNN

Malware considered as one of the main actors in cyber attacks. Everyday, the number of unique malware samples are in the rise, however the ratio of benign software still greatly outnumbers malware samples. In machine learning, such datasets are known as imbalanced, where the majority class label gre...

Full description

Saved in:
Bibliographic Details
Main Authors: Arwa Alzammam, Hamad Binsalleeh, Basil AsSdhan, Kostas Kyriakopoulos, Sangarapillai Lambotharan
Format: Default Conference proceeding
Published: 2020
Subjects:
Online Access:https://hdl.handle.net/2134/10007939.v1
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1818167263467929600
author Arwa Alzammam
Hamad Binsalleeh
Basil AsSdhan
Kostas Kyriakopoulos
Sangarapillai Lambotharan
author_facet Arwa Alzammam
Hamad Binsalleeh
Basil AsSdhan
Kostas Kyriakopoulos
Sangarapillai Lambotharan
author_sort Arwa Alzammam (7520807)
collection Figshare
description Malware considered as one of the main actors in cyber attacks. Everyday, the number of unique malware samples are in the rise, however the ratio of benign software still greatly outnumbers malware samples. In machine learning, such datasets are known as imbalanced, where the majority class label greatly dominate the other ones. In this paper, we present a comparative analysis and evaluation of some of the proposed techniques in the literature to address the problem of classifying imbalanced multiclass malware datasets. We used Convolutional Neural Network (CNN) as a classification algorithm to study the effect of imbalanced datasets on deep learning approaches. The experiments are conducted on three publicly available imbalanced datasets. Our performance analysis shows that methods such as cost sensitive learning, oversampling and cross validation have positive effects on the model classification performance with varying degree. While others like using pre-trained models require more special parameter settings. However, best practice may change according to the problem domain.
format Default
Conference proceeding
id rr-article-10007939
institution Loughborough University
publishDate 2020
record_format Figshare
spelling rr-article-100079392020-09-10T00:00:00Z Comparative analysis on imbalanced multi-class classification for malware samples using CNN Arwa Alzammam (7520807) Hamad Binsalleeh (7524999) Basil AsSdhan (7525002) Kostas Kyriakopoulos (1250595) Sangarapillai Lambotharan (1252278) Malware classification Imbalanced dataset Deep learning Malware considered as one of the main actors in cyber attacks. Everyday, the number of unique malware samples are in the rise, however the ratio of benign software still greatly outnumbers malware samples. In machine learning, such datasets are known as imbalanced, where the majority class label greatly dominate the other ones. In this paper, we present a comparative analysis and evaluation of some of the proposed techniques in the literature to address the problem of classifying imbalanced multiclass malware datasets. We used Convolutional Neural Network (CNN) as a classification algorithm to study the effect of imbalanced datasets on deep learning approaches. The experiments are conducted on three publicly available imbalanced datasets. Our performance analysis shows that methods such as cost sensitive learning, oversampling and cross validation have positive effects on the model classification performance with varying degree. While others like using pre-trained models require more special parameter settings. However, best practice may change according to the problem domain. 2020-09-10T00:00:00Z Text Conference contribution 2134/10007939.v1 https://figshare.com/articles/conference_contribution/Comparative_analysis_on_imbalanced_multi-class_classification_for_malware_samples_using_CNN/10007939 All Rights Reserved
spellingShingle Malware classification
Imbalanced dataset
Deep learning
Arwa Alzammam
Hamad Binsalleeh
Basil AsSdhan
Kostas Kyriakopoulos
Sangarapillai Lambotharan
Comparative analysis on imbalanced multi-class classification for malware samples using CNN
title Comparative analysis on imbalanced multi-class classification for malware samples using CNN
title_full Comparative analysis on imbalanced multi-class classification for malware samples using CNN
title_fullStr Comparative analysis on imbalanced multi-class classification for malware samples using CNN
title_full_unstemmed Comparative analysis on imbalanced multi-class classification for malware samples using CNN
title_short Comparative analysis on imbalanced multi-class classification for malware samples using CNN
title_sort comparative analysis on imbalanced multi-class classification for malware samples using cnn
topic Malware classification
Imbalanced dataset
Deep learning
url https://hdl.handle.net/2134/10007939.v1