Loading…

A mathematical analysis about the geo-temporal characterization of the multi-class maliciousness of an IP address

The degree of severity of a cybersecurity event of potentially malicious activity is crucial to determine an appropriate response. Machine learning techniques are used to obtain models to achieve a correct characterization of events as well as an accurate maliciousness measure. A model is constructe...

Full description

Saved in:
Bibliographic Details
Published in:Wireless networks 2024-08, Vol.30 (6), p.5033-5048
Main Authors: DeCastro-García, Noemí, Escudero García, David, Carriegos, Miguel V.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The degree of severity of a cybersecurity event of potentially malicious activity is crucial to determine an appropriate response. Machine learning techniques are used to obtain models to achieve a correct characterization of events as well as an accurate maliciousness measure. A model is constructed by setting up an effective collection of features that are able to characterize the maliciousness of IP addresses. As a first approach, a set of contextual features has been selected. These features are simple to extract and require no high costs. That set contains 23 features: 18 obtained through time series analysis, and the other 5 extracted directly related with the spatial (geolocation) and temporal (time of occurrence) similarity of the events. The test of the feature set is conducted with statistical analyses to determine which features are the most effective and the impact of hyperparameter selection in the construction of a time series of features. In addition, the effect of IP geo-localization shifts on model performance is studied through concept drift theory. The results conclude that, overall, the feature set provides adequate performance for our task, although more complex features may be required to achieve better performance. The best results are obtained with the geolocation and time occurrence features.
ISSN:1022-0038
1572-8196
DOI:10.1007/s11276-022-03215-2