Loading…

Identifying Authorship in Malicious Binaries: Features, Challenges & Datasets

Attributing a piece of malware to its creator typically requires threat intelligence. Binary attribution increases the level of difficulty as it mostly relies upon the ability to disassemble binaries to obtain authorship-related features. We perform a systematic analysis of works in the area of malw...

Full description

Saved in:
Bibliographic Details
Published in:ACM computing surveys 2024-08, Vol.56 (8), p.1-36, Article 212
Main Authors: Gray, Jason, Sgandurra, Daniele, Cavallaro, Lorenzo, Blasco Alis, Jorge
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-a267t-36430d624b198843d5a851a0c7d2ef9e9f2224da95a309820f8009056a5fcfdb3
container_end_page 36
container_issue 8
container_start_page 1
container_title ACM computing surveys
container_volume 56
creator Gray, Jason
Sgandurra, Daniele
Cavallaro, Lorenzo
Blasco Alis, Jorge
description Attributing a piece of malware to its creator typically requires threat intelligence. Binary attribution increases the level of difficulty as it mostly relies upon the ability to disassemble binaries to obtain authorship-related features. We perform a systematic analysis of works in the area of malware authorship attribution. We identify key findings and some shortcomings of current approaches and explore the open research challenges. To mitigate the lack of ground-truth datasets in this domain, we publish alongside this survey the largest and most diverse meta-information dataset of 17,513 malware labeled to 275 threat actor groups.
doi_str_mv 10.1145/3653973
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3090665955</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3090665955</sourcerecordid><originalsourceid>FETCH-LOGICAL-a267t-36430d624b198843d5a851a0c7d2ef9e9f2224da95a309820f8009056a5fcfdb3</originalsourceid><addsrcrecordid>eNo90D1PwzAQBmALgUQpiJ3JEhIsBM6fidnaQqFSKxaYo2tit67SpNjJ0H9PUAvTDfe8d9JLyDWDR8akehJaCZOKEzJgSqVJKiQ7JQMQGhIQAOfkIsYNAHDJ9IAsZqWtW-_2vl7RUdeumxDXfkd9TRdY-cI3XaRjX2PwNj7TqcW2CzY-0Mkaq8rWKxvpHX3BFqNt4yU5c1hFe3WcQ_I1ff2cvCfzj7fZZDRPkOu0TYSWAkrN5ZKZLJOiVJgphlCkJbfOWOM457JEo1CAyTi4DMCA0qhc4cqlGJLbw91daL47G9t803Sh7l_mfQC0VkapXt0fVBGaGIN1-S74LYZ9ziD_7So_dtXLm4PEYvuP_pY_HqVhtw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3090665955</pqid></control><display><type>article</type><title>Identifying Authorship in Malicious Binaries: Features, Challenges &amp; Datasets</title><source>Business Source Ultimate</source><source>Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)</source><creator>Gray, Jason ; Sgandurra, Daniele ; Cavallaro, Lorenzo ; Blasco Alis, Jorge</creator><creatorcontrib>Gray, Jason ; Sgandurra, Daniele ; Cavallaro, Lorenzo ; Blasco Alis, Jorge</creatorcontrib><description>Attributing a piece of malware to its creator typically requires threat intelligence. Binary attribution increases the level of difficulty as it mostly relies upon the ability to disassemble binaries to obtain authorship-related features. We perform a systematic analysis of works in the area of malware authorship attribution. We identify key findings and some shortcomings of current approaches and explore the open research challenges. To mitigate the lack of ground-truth datasets in this domain, we publish alongside this survey the largest and most diverse meta-information dataset of 17,513 malware labeled to 275 threat actor groups.</description><identifier>ISSN: 0360-0300</identifier><identifier>EISSN: 1557-7341</identifier><identifier>DOI: 10.1145/3653973</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Applied computing ; Computer science ; Datasets ; Evidence collection, storage and analysis ; General and reference ; Intelligence gathering ; Investigation techniques ; Malware ; Malware and its mitigation ; Pseudonymity, anonymity and untraceability ; Security and privacy ; Surveys and overviews ; Threat evaluation</subject><ispartof>ACM computing surveys, 2024-08, Vol.56 (8), p.1-36, Article 212</ispartof><rights>Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from</rights><rights>Copyright Association for Computing Machinery Aug 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a267t-36430d624b198843d5a851a0c7d2ef9e9f2224da95a309820f8009056a5fcfdb3</cites><orcidid>0000-0003-4392-9023 ; 0000-0001-5238-8068 ; 0000-0003-3518-7023 ; 0000-0002-3878-2680</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Gray, Jason</creatorcontrib><creatorcontrib>Sgandurra, Daniele</creatorcontrib><creatorcontrib>Cavallaro, Lorenzo</creatorcontrib><creatorcontrib>Blasco Alis, Jorge</creatorcontrib><title>Identifying Authorship in Malicious Binaries: Features, Challenges &amp; Datasets</title><title>ACM computing surveys</title><addtitle>ACM CSUR</addtitle><description>Attributing a piece of malware to its creator typically requires threat intelligence. Binary attribution increases the level of difficulty as it mostly relies upon the ability to disassemble binaries to obtain authorship-related features. We perform a systematic analysis of works in the area of malware authorship attribution. We identify key findings and some shortcomings of current approaches and explore the open research challenges. To mitigate the lack of ground-truth datasets in this domain, we publish alongside this survey the largest and most diverse meta-information dataset of 17,513 malware labeled to 275 threat actor groups.</description><subject>Applied computing</subject><subject>Computer science</subject><subject>Datasets</subject><subject>Evidence collection, storage and analysis</subject><subject>General and reference</subject><subject>Intelligence gathering</subject><subject>Investigation techniques</subject><subject>Malware</subject><subject>Malware and its mitigation</subject><subject>Pseudonymity, anonymity and untraceability</subject><subject>Security and privacy</subject><subject>Surveys and overviews</subject><subject>Threat evaluation</subject><issn>0360-0300</issn><issn>1557-7341</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNo90D1PwzAQBmALgUQpiJ3JEhIsBM6fidnaQqFSKxaYo2tit67SpNjJ0H9PUAvTDfe8d9JLyDWDR8akehJaCZOKEzJgSqVJKiQ7JQMQGhIQAOfkIsYNAHDJ9IAsZqWtW-_2vl7RUdeumxDXfkd9TRdY-cI3XaRjX2PwNj7TqcW2CzY-0Mkaq8rWKxvpHX3BFqNt4yU5c1hFe3WcQ_I1ff2cvCfzj7fZZDRPkOu0TYSWAkrN5ZKZLJOiVJgphlCkJbfOWOM457JEo1CAyTi4DMCA0qhc4cqlGJLbw91daL47G9t803Sh7l_mfQC0VkapXt0fVBGaGIN1-S74LYZ9ziD_7So_dtXLm4PEYvuP_pY_HqVhtw</recordid><startdate>20240801</startdate><enddate>20240801</enddate><creator>Gray, Jason</creator><creator>Sgandurra, Daniele</creator><creator>Cavallaro, Lorenzo</creator><creator>Blasco Alis, Jorge</creator><general>ACM</general><general>Association for Computing Machinery</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-4392-9023</orcidid><orcidid>https://orcid.org/0000-0001-5238-8068</orcidid><orcidid>https://orcid.org/0000-0003-3518-7023</orcidid><orcidid>https://orcid.org/0000-0002-3878-2680</orcidid></search><sort><creationdate>20240801</creationdate><title>Identifying Authorship in Malicious Binaries: Features, Challenges &amp; Datasets</title><author>Gray, Jason ; Sgandurra, Daniele ; Cavallaro, Lorenzo ; Blasco Alis, Jorge</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a267t-36430d624b198843d5a851a0c7d2ef9e9f2224da95a309820f8009056a5fcfdb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Applied computing</topic><topic>Computer science</topic><topic>Datasets</topic><topic>Evidence collection, storage and analysis</topic><topic>General and reference</topic><topic>Intelligence gathering</topic><topic>Investigation techniques</topic><topic>Malware</topic><topic>Malware and its mitigation</topic><topic>Pseudonymity, anonymity and untraceability</topic><topic>Security and privacy</topic><topic>Surveys and overviews</topic><topic>Threat evaluation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gray, Jason</creatorcontrib><creatorcontrib>Sgandurra, Daniele</creatorcontrib><creatorcontrib>Cavallaro, Lorenzo</creatorcontrib><creatorcontrib>Blasco Alis, Jorge</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>ACM computing surveys</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Gray, Jason</au><au>Sgandurra, Daniele</au><au>Cavallaro, Lorenzo</au><au>Blasco Alis, Jorge</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Identifying Authorship in Malicious Binaries: Features, Challenges &amp; Datasets</atitle><jtitle>ACM computing surveys</jtitle><stitle>ACM CSUR</stitle><date>2024-08-01</date><risdate>2024</risdate><volume>56</volume><issue>8</issue><spage>1</spage><epage>36</epage><pages>1-36</pages><artnum>212</artnum><issn>0360-0300</issn><eissn>1557-7341</eissn><abstract>Attributing a piece of malware to its creator typically requires threat intelligence. Binary attribution increases the level of difficulty as it mostly relies upon the ability to disassemble binaries to obtain authorship-related features. We perform a systematic analysis of works in the area of malware authorship attribution. We identify key findings and some shortcomings of current approaches and explore the open research challenges. To mitigate the lack of ground-truth datasets in this domain, we publish alongside this survey the largest and most diverse meta-information dataset of 17,513 malware labeled to 275 threat actor groups.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3653973</doi><tpages>36</tpages><orcidid>https://orcid.org/0000-0003-4392-9023</orcidid><orcidid>https://orcid.org/0000-0001-5238-8068</orcidid><orcidid>https://orcid.org/0000-0003-3518-7023</orcidid><orcidid>https://orcid.org/0000-0002-3878-2680</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0360-0300
ispartof ACM computing surveys, 2024-08, Vol.56 (8), p.1-36, Article 212
issn 0360-0300
1557-7341
language eng
recordid cdi_proquest_journals_3090665955
source Business Source Ultimate; Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)
subjects Applied computing
Computer science
Datasets
Evidence collection, storage and analysis
General and reference
Intelligence gathering
Investigation techniques
Malware
Malware and its mitigation
Pseudonymity, anonymity and untraceability
Security and privacy
Surveys and overviews
Threat evaluation
title Identifying Authorship in Malicious Binaries: Features, Challenges & Datasets
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T11%3A14%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Identifying%20Authorship%20in%20Malicious%20Binaries:%20Features,%20Challenges%20&%20Datasets&rft.jtitle=ACM%20computing%20surveys&rft.au=Gray,%20Jason&rft.date=2024-08-01&rft.volume=56&rft.issue=8&rft.spage=1&rft.epage=36&rft.pages=1-36&rft.artnum=212&rft.issn=0360-0300&rft.eissn=1557-7341&rft_id=info:doi/10.1145/3653973&rft_dat=%3Cproquest_cross%3E3090665955%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a267t-36430d624b198843d5a851a0c7d2ef9e9f2224da95a309820f8009056a5fcfdb3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3090665955&rft_id=info:pmid/&rfr_iscdi=true