Loading…
Comparative analysis on imbalanced multi-class classification for malware samples using CNN
Malware considered as one of the main actors in cyber attacks. Everyday, the number of unique malware samples are in the rise, however the ratio of benign software still greatly outnumbers malware samples. In machine learning, such datasets are known as imbalanced, where the majority class label gre...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Default Conference proceeding |
Published: |
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/2134/10007939.v1 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1818167263467929600 |
---|---|
author | Arwa Alzammam Hamad Binsalleeh Basil AsSdhan Kostas Kyriakopoulos Sangarapillai Lambotharan |
author_facet | Arwa Alzammam Hamad Binsalleeh Basil AsSdhan Kostas Kyriakopoulos Sangarapillai Lambotharan |
author_sort | Arwa Alzammam (7520807) |
collection | Figshare |
description | Malware considered as one of the main actors in cyber attacks. Everyday, the number of unique malware samples are in the rise, however the ratio of benign software still greatly outnumbers malware samples. In machine learning, such datasets are known as imbalanced, where the majority class label greatly dominate the other ones. In this paper, we present a comparative analysis and evaluation of some of the proposed techniques in the literature to address the problem of classifying imbalanced multiclass malware datasets. We used Convolutional Neural Network (CNN) as a classification algorithm to study the effect of imbalanced datasets on deep learning approaches. The experiments are conducted on three publicly available imbalanced datasets. Our performance analysis shows that methods such as cost sensitive learning, oversampling and cross validation have positive effects on the model classification performance with varying degree. While others like using pre-trained models require more special parameter settings. However, best practice may change according to the problem domain. |
format | Default Conference proceeding |
id | rr-article-10007939 |
institution | Loughborough University |
publishDate | 2020 |
record_format | Figshare |
spelling | rr-article-100079392020-09-10T00:00:00Z Comparative analysis on imbalanced multi-class classification for malware samples using CNN Arwa Alzammam (7520807) Hamad Binsalleeh (7524999) Basil AsSdhan (7525002) Kostas Kyriakopoulos (1250595) Sangarapillai Lambotharan (1252278) Malware classification Imbalanced dataset Deep learning Malware considered as one of the main actors in cyber attacks. Everyday, the number of unique malware samples are in the rise, however the ratio of benign software still greatly outnumbers malware samples. In machine learning, such datasets are known as imbalanced, where the majority class label greatly dominate the other ones. In this paper, we present a comparative analysis and evaluation of some of the proposed techniques in the literature to address the problem of classifying imbalanced multiclass malware datasets. We used Convolutional Neural Network (CNN) as a classification algorithm to study the effect of imbalanced datasets on deep learning approaches. The experiments are conducted on three publicly available imbalanced datasets. Our performance analysis shows that methods such as cost sensitive learning, oversampling and cross validation have positive effects on the model classification performance with varying degree. While others like using pre-trained models require more special parameter settings. However, best practice may change according to the problem domain. 2020-09-10T00:00:00Z Text Conference contribution 2134/10007939.v1 https://figshare.com/articles/conference_contribution/Comparative_analysis_on_imbalanced_multi-class_classification_for_malware_samples_using_CNN/10007939 All Rights Reserved |
spellingShingle | Malware classification Imbalanced dataset Deep learning Arwa Alzammam Hamad Binsalleeh Basil AsSdhan Kostas Kyriakopoulos Sangarapillai Lambotharan Comparative analysis on imbalanced multi-class classification for malware samples using CNN |
title | Comparative analysis on imbalanced multi-class classification for malware samples using CNN |
title_full | Comparative analysis on imbalanced multi-class classification for malware samples using CNN |
title_fullStr | Comparative analysis on imbalanced multi-class classification for malware samples using CNN |
title_full_unstemmed | Comparative analysis on imbalanced multi-class classification for malware samples using CNN |
title_short | Comparative analysis on imbalanced multi-class classification for malware samples using CNN |
title_sort | comparative analysis on imbalanced multi-class classification for malware samples using cnn |
topic | Malware classification Imbalanced dataset Deep learning |
url | https://hdl.handle.net/2134/10007939.v1 |