Loading…

Comparative analysis on imbalanced multi-class classification for malware samples using CNN

Malware considered as one of the main actors in cyber attacks. Everyday, the number of unique malware samples are in the rise, however the ratio of benign software still greatly outnumbers malware samples. In machine learning, such datasets are known as imbalanced, where the majority class label gre...

Full description

Saved in:
Bibliographic Details
Main Authors: Arwa Alzammam, Hamad Binsalleeh, Basil AsSdhan, Kostas Kyriakopoulos, Sangarapillai Lambotharan
Format: Default Conference proceeding
Published: 2020
Subjects:
Online Access:https://hdl.handle.net/2134/10007939.v1
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Malware considered as one of the main actors in cyber attacks. Everyday, the number of unique malware samples are in the rise, however the ratio of benign software still greatly outnumbers malware samples. In machine learning, such datasets are known as imbalanced, where the majority class label greatly dominate the other ones. In this paper, we present a comparative analysis and evaluation of some of the proposed techniques in the literature to address the problem of classifying imbalanced multiclass malware datasets. We used Convolutional Neural Network (CNN) as a classification algorithm to study the effect of imbalanced datasets on deep learning approaches. The experiments are conducted on three publicly available imbalanced datasets. Our performance analysis shows that methods such as cost sensitive learning, oversampling and cross validation have positive effects on the model classification performance with varying degree. While others like using pre-trained models require more special parameter settings. However, best practice may change according to the problem domain.