Loading…

Detection of Algorithmically Generated Malicious Domain Names with Feature Fusion of Meaningful Word Segmentation and N-Gram Sequences

Domain generation algorithms (DGAs) play an important role in network attacks and can be mainly divided into two types: dictionary-based and character-based. Dictionary-based algorithmically generated domains (AGDs) are similar in composition to normal domains and are harder to detect. Although meth...

Full description

Saved in:
Bibliographic Details
Published in:Applied sciences 2023-04, Vol.13 (7), p.4406
Main Authors: Chen, Shaojie, Lang, Bo, Chen, Yikai, Xie, Chong
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Domain generation algorithms (DGAs) play an important role in network attacks and can be mainly divided into two types: dictionary-based and character-based. Dictionary-based algorithmically generated domains (AGDs) are similar in composition to normal domains and are harder to detect. Although methods based on meaningful word segmentation and n-gram sequence features exhibit good detection performance for AGDs, they are inadequate for mining meaningful word features of domain names, and the performance of hybrid detection of character-based and dictionary-based AGDs needs to be further improved. Therefore, in this paper, we first describe the composition of dictionary-based AGDs using meaningful word segmentation, introduce the standard deviation to better measure the word distribution features, and construct additional 11-dimensional statistical features for word segmentation results as a supplement. Then, by combining 3-gram and 1-gram sequence features, we improve the detection performance for both character-based and dictionary-based AGDs. Finally, we perform feature fusion of the above four kinds of features to achieve an end-to-end detection method for both kinds of AGDs. Experimental results showed that our method achieved an accuracy of 97.24% on the full dataset and better accuracy and F1 values than existing methods on both dictionary-based and character-based AGD datasets.
ISSN:2076-3417
2076-3417
DOI:10.3390/app13074406