Loading…

Handling partially labeled network data: A semi-supervised approach using stacked sparse autoencoder

Network traffic analytics has become a crucial task in order to better understand and manage network resources, especially in the network softwarization era where the implementation of this concept can be done easily with network function virtualization. Currently, many approaches have been proposed...

Full description

Saved in:
Bibliographic Details
Published in:Computer networks (Amsterdam, Netherlands : 1999) Netherlands : 1999), 2022-04, Vol.207, p.108742-12, Article 108742
Main Authors: Aouedi, Ons, Piamrat, Kandaraj, Bagadthey, Dhruvjyoti
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Network traffic analytics has become a crucial task in order to better understand and manage network resources, especially in the network softwarization era where the implementation of this concept can be done easily with network function virtualization. Currently, many approaches have been proposed to improve the performance of traffic classification. However, as new types of traffic emerge every day (and they are generally not labeled), this opens a new challenge to be handled. Moreover, the question of how to accurately classify the traffic using a limited amount of labeled data or partially labeled data is also another important concern. In fact, labeling data is often difficult and time-consuming. In order to solve the previously described issues, we reformulate traffic classification into a semi-supervised learning where both supervised learning (using labeled data) and unsupervised learning (no label data) are combined. To do so, this paper presents a stacked sparse autoencoder (SSAE) based semi-supervised deep-learning model for traffic classification. The main motivations of this approach are: (i) unlabeled data is often abundant and easily available; (ii) classification performance of the whole model can be greatly improved when a large amount of unlabeled traffic is included in the training process; (iii) there is a limit to how much human effort can be thrown at the labeling problem. To investigate the performance of our approach, an empirical study has been conducted on a real dataset and results indicate that using a large amount of unlabeled data in the SSAE pre-trained phase can improve significantly the classification performance of the whole model. Furthermore, the proposed approach is compared against other representative machine-learning and deep-learning models, which are Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Multi-Layer Perceptron (MLP), eXtreme Gradient Boosting (XGBoost), and Autoencoder.
ISSN:1389-1286
1872-7069
DOI:10.1016/j.comnet.2021.108742