Loading…

Malware Family Discovery Using Reversible Jump MCMC Sampling of Regimes

Malware is computer software that has either been designed or modified with malicious intent. Hundreds of thousands of new malware threats appear on the internet each day. This is made possible through reuse of known exploits in computer systems that have not been fully eradicated; existing pieces o...

Full description

Saved in:
Bibliographic Details
Published in:Journal of the American Statistical Association 2018-10, Vol.113 (524), p.1490-1502
Main Authors: Bolton, Alexander D., Heard, Nicholas A.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Malware is computer software that has either been designed or modified with malicious intent. Hundreds of thousands of new malware threats appear on the internet each day. This is made possible through reuse of known exploits in computer systems that have not been fully eradicated; existing pieces of malware can be trivially modified and combined to create new malware, which is unknown to anti-virus programs. Finding new software with similarities to known malware is therefore an important goal in cyber-security. A dynamic instruction trace of a piece of software is the sequence of machine language instructions it generates when executed. Statistical analysis of a dynamic instruction trace can help reverse engineers infer the purpose and origin of the software that generated it. Instruction traces have been successfully modeled as simple Markov chains, but empirically there are change points in the structure of the traces, with recurring regimes of transition patterns. Here, reversible jump Markov chain Monte Carlo for change point detection is extended to incorporate regime-switching, allowing regimes to be inferred from malware instruction traces. A similarity measure for malware programs based on regime matching is then used to infer the originating families, leading to compelling performance results.
ISSN:0162-1459
1537-274X
DOI:10.1080/01621459.2018.1423984