Loading…
Virtual Sample Generation for Retraining the Malicious PDF Detection Model
PDF files are adopted for launching cyberattacks because of their popularity and the increasing number of relative vulnerabilities. Machine learning algorithms are developed to detect the maliciousness of PDF files. As the exploits of new vulnerabilities occur, the assumption that the training data...
Saved in:
Published in: | Journal of physics. Conference series 2020-07, Vol.1584 (1), p.12056 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | PDF files are adopted for launching cyberattacks because of their popularity and the increasing number of relative vulnerabilities. Machine learning algorithms are developed to detect the maliciousness of PDF files. As the exploits of new vulnerabilities occur, the assumption that the training data and the test data share the same distribution does not hold and the ability of origin model to detect exploits of new vulnerabilities weakens gradually. In a real environment, it is very difficult to obtain numerous samples of exploits with the same CVE. and the machine learning models are difficult to be improved by retraining. Virtual sample generation could be used to generate sufficient virtual samples by small sample sets to improve the generalization of the existing model. A new VSG algorithm based on prior knowledge is proposed in this paper, which performs better than other VSG algorithms in improving the detection on exploits of new vulnerabilities. |
---|---|
ISSN: | 1742-6588 1742-6596 |
DOI: | 10.1088/1742-6596/1584/1/012056 |