Loading…

Comparative analysis on imbalanced multi-class classification for malware samples using CNN

Malware considered as one of the main actors in cyber attacks. Everyday, the number of unique malware samples are in the rise, however the ratio of benign software still greatly outnumbers malware samples. In machine learning, such datasets are known as imbalanced, where the majority class label gre...

Full description

Saved in:

Bibliographic Details
Main Authors:	Arwa Alzammam, Hamad Binsalleeh, Basil AsSdhan, Kostas Kyriakopoulos, Sangarapillai Lambotharan
Format:	Default Conference proceeding
Published:	2020
Subjects:	Malware classification Imbalanced dataset Deep learning
Online Access:	https://hdl.handle.net/2134/10007939.v1
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1818167263467929600
author	Arwa Alzammam Hamad Binsalleeh Basil AsSdhan Kostas Kyriakopoulos Sangarapillai Lambotharan
author_facet	Arwa Alzammam Hamad Binsalleeh Basil AsSdhan Kostas Kyriakopoulos Sangarapillai Lambotharan
author_sort	Arwa Alzammam (7520807)
collection	Figshare
description	Malware considered as one of the main actors in cyber attacks. Everyday, the number of unique malware samples are in the rise, however the ratio of benign software still greatly outnumbers malware samples. In machine learning, such datasets are known as imbalanced, where the majority class label greatly dominate the other ones. In this paper, we present a comparative analysis and evaluation of some of the proposed techniques in the literature to address the problem of classifying imbalanced multiclass malware datasets. We used Convolutional Neural Network (CNN) as a classification algorithm to study the effect of imbalanced datasets on deep learning approaches. The experiments are conducted on three publicly available imbalanced datasets. Our performance analysis shows that methods such as cost sensitive learning, oversampling and cross validation have positive effects on the model classification performance with varying degree. While others like using pre-trained models require more special parameter settings. However, best practice may change according to the problem domain.
format	Default Conference proceeding
id	rr-article-10007939
institution	Loughborough University
publishDate	2020
record_format	Figshare
spelling	rr-article-100079392020-09-10T00:00:00Z Comparative analysis on imbalanced multi-class classification for malware samples using CNN Arwa Alzammam (7520807) Hamad Binsalleeh (7524999) Basil AsSdhan (7525002) Kostas Kyriakopoulos (1250595) Sangarapillai Lambotharan (1252278) Malware classification Imbalanced dataset Deep learning Malware considered as one of the main actors in cyber attacks. Everyday, the number of unique malware samples are in the rise, however the ratio of benign software still greatly outnumbers malware samples. In machine learning, such datasets are known as imbalanced, where the majority class label greatly dominate the other ones. In this paper, we present a comparative analysis and evaluation of some of the proposed techniques in the literature to address the problem of classifying imbalanced multiclass malware datasets. We used Convolutional Neural Network (CNN) as a classification algorithm to study the effect of imbalanced datasets on deep learning approaches. The experiments are conducted on three publicly available imbalanced datasets. Our performance analysis shows that methods such as cost sensitive learning, oversampling and cross validation have positive effects on the model classification performance with varying degree. While others like using pre-trained models require more special parameter settings. However, best practice may change according to the problem domain. 2020-09-10T00:00:00Z Text Conference contribution 2134/10007939.v1 https://figshare.com/articles/conference_contribution/Comparative_analysis_on_imbalanced_multi-class_classification_for_malware_samples_using_CNN/10007939 All Rights Reserved
spellingShingle	Malware classification Imbalanced dataset Deep learning Arwa Alzammam Hamad Binsalleeh Basil AsSdhan Kostas Kyriakopoulos Sangarapillai Lambotharan Comparative analysis on imbalanced multi-class classification for malware samples using CNN
title	Comparative analysis on imbalanced multi-class classification for malware samples using CNN
title_full	Comparative analysis on imbalanced multi-class classification for malware samples using CNN
title_fullStr	Comparative analysis on imbalanced multi-class classification for malware samples using CNN
title_full_unstemmed	Comparative analysis on imbalanced multi-class classification for malware samples using CNN
title_short	Comparative analysis on imbalanced multi-class classification for malware samples using CNN
title_sort	comparative analysis on imbalanced multi-class classification for malware samples using cnn
topic	Malware classification Imbalanced dataset Deep learning
url	https://hdl.handle.net/2134/10007939.v1

Comparative analysis on imbalanced multi-class classification for malware samples using CNN

Similar Items