Loading…

Large-scale malware classification using random projections and neural networks

Automatically generated malware is a significant problem for computer users. Analysts are able to manually investigate a small number of unknown files, but the best large-scale defense for detecting malware is automated malware classification. Malware classifiers often use sparse binary features, an...

Full description

Saved in:
Bibliographic Details
Main Authors: Dahl, George E., Stokes, Jack W., Li Deng, Dong Yu
Format: Conference Proceeding
Language:English
Subjects:
Citations: Items that cite this one
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c336t-c646f8be0c919daab40f90a2c6c063faf988b46601e16d6535abd6dd022d830b3
cites
container_end_page 3426
container_issue
container_start_page 3422
container_title
container_volume
creator Dahl, George E.
Stokes, Jack W.
Li Deng
Dong Yu
description Automatically generated malware is a significant problem for computer users. Analysts are able to manually investigate a small number of unknown files, but the best large-scale defense for detecting malware is automated malware classification. Malware classifiers often use sparse binary features, and the number of potential features can be on the order of tens or hundreds of millions. Feature selection reduces the number of features to a manageable number for training simpler algorithms such as logistic regression, but this number is still too large for more complex algorithms such as neural networks. To overcome this problem, we used random projections to further reduce the dimensionality of the original input space. Using this architecture, we train several very large-scale neural network systems with over 2.6 million labeled samples thereby achieving classification results with a two-class error rate of 0.49% for a single neural network and 0.42% for an ensemble of neural networks.
doi_str_mv 10.1109/ICASSP.2013.6638293
format conference_proceeding
fullrecord <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_6638293</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6638293</ieee_id><sourcerecordid>6638293</sourcerecordid><originalsourceid>FETCH-LOGICAL-c336t-c646f8be0c919daab40f90a2c6c063faf988b46601e16d6535abd6dd022d830b3</originalsourceid><addsrcrecordid>eNotkF1LwzAUhqMo2E1_wW7yBzpPkvY0uZShTihMmIJ34zRJR2c_RtIx_PdW3NUDz8XLw8vYQsBSCDCPb6un7fZ9KUGoJaLS0qgrNhNZYQyoHPGaJVIVJhUGvm5YInIJKYrM3LFZjAcA0EWmE7YpKex9Gi21nnfUnil4bluKsakbS2Mz9PwUm37PA_Vu6PgxDAdv_3zkk-G9PwVqJ4znIXzHe3ZbUxv9w4Vz9vny_LFap-XmdUouU6sUjqnFDGtdebBGGEdUZVAbIGnRAqqaaqN1lSGC8AId5iqnyqFzIKXTCio1Z4v_3cZ7vzuGpqPws7scoX4BEqJSOg</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Large-scale malware classification using random projections and neural networks</title><source>IEEE Xplore All Conference Series</source><creator>Dahl, George E. ; Stokes, Jack W. ; Li Deng ; Dong Yu</creator><creatorcontrib>Dahl, George E. ; Stokes, Jack W. ; Li Deng ; Dong Yu</creatorcontrib><description>Automatically generated malware is a significant problem for computer users. Analysts are able to manually investigate a small number of unknown files, but the best large-scale defense for detecting malware is automated malware classification. Malware classifiers often use sparse binary features, and the number of potential features can be on the order of tens or hundreds of millions. Feature selection reduces the number of features to a manageable number for training simpler algorithms such as logistic regression, but this number is still too large for more complex algorithms such as neural networks. To overcome this problem, we used random projections to further reduce the dimensionality of the original input space. Using this architecture, we train several very large-scale neural network systems with over 2.6 million labeled samples thereby achieving classification results with a two-class error rate of 0.49% for a single neural network and 0.42% for an ensemble of neural networks.</description><identifier>ISSN: 1520-6149</identifier><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 1479903566</identifier><identifier>EISBN: 9781479903566</identifier><identifier>DOI: 10.1109/ICASSP.2013.6638293</identifier><language>eng</language><publisher>IEEE</publisher><subject>Computers ; Error analysis ; Logistics ; Malware ; Malware Classification ; Neural Network ; Neural networks ; Random Projections ; Training ; Vectors</subject><ispartof>2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, p.3422-3426</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c336t-c646f8be0c919daab40f90a2c6c063faf988b46601e16d6535abd6dd022d830b3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6638293$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,23930,23931,25140,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6638293$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Dahl, George E.</creatorcontrib><creatorcontrib>Stokes, Jack W.</creatorcontrib><creatorcontrib>Li Deng</creatorcontrib><creatorcontrib>Dong Yu</creatorcontrib><title>Large-scale malware classification using random projections and neural networks</title><title>2013 IEEE International Conference on Acoustics, Speech and Signal Processing</title><addtitle>ICASSP</addtitle><description>Automatically generated malware is a significant problem for computer users. Analysts are able to manually investigate a small number of unknown files, but the best large-scale defense for detecting malware is automated malware classification. Malware classifiers often use sparse binary features, and the number of potential features can be on the order of tens or hundreds of millions. Feature selection reduces the number of features to a manageable number for training simpler algorithms such as logistic regression, but this number is still too large for more complex algorithms such as neural networks. To overcome this problem, we used random projections to further reduce the dimensionality of the original input space. Using this architecture, we train several very large-scale neural network systems with over 2.6 million labeled samples thereby achieving classification results with a two-class error rate of 0.49% for a single neural network and 0.42% for an ensemble of neural networks.</description><subject>Computers</subject><subject>Error analysis</subject><subject>Logistics</subject><subject>Malware</subject><subject>Malware Classification</subject><subject>Neural Network</subject><subject>Neural networks</subject><subject>Random Projections</subject><subject>Training</subject><subject>Vectors</subject><issn>1520-6149</issn><issn>2379-190X</issn><isbn>1479903566</isbn><isbn>9781479903566</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2013</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotkF1LwzAUhqMo2E1_wW7yBzpPkvY0uZShTihMmIJ34zRJR2c_RtIx_PdW3NUDz8XLw8vYQsBSCDCPb6un7fZ9KUGoJaLS0qgrNhNZYQyoHPGaJVIVJhUGvm5YInIJKYrM3LFZjAcA0EWmE7YpKex9Gi21nnfUnil4bluKsakbS2Mz9PwUm37PA_Vu6PgxDAdv_3zkk-G9PwVqJ4znIXzHe3ZbUxv9w4Vz9vny_LFap-XmdUouU6sUjqnFDGtdebBGGEdUZVAbIGnRAqqaaqN1lSGC8AId5iqnyqFzIKXTCio1Z4v_3cZ7vzuGpqPws7scoX4BEqJSOg</recordid><startdate>201305</startdate><enddate>201305</enddate><creator>Dahl, George E.</creator><creator>Stokes, Jack W.</creator><creator>Li Deng</creator><creator>Dong Yu</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201305</creationdate><title>Large-scale malware classification using random projections and neural networks</title><author>Dahl, George E. ; Stokes, Jack W. ; Li Deng ; Dong Yu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c336t-c646f8be0c919daab40f90a2c6c063faf988b46601e16d6535abd6dd022d830b3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Computers</topic><topic>Error analysis</topic><topic>Logistics</topic><topic>Malware</topic><topic>Malware Classification</topic><topic>Neural Network</topic><topic>Neural networks</topic><topic>Random Projections</topic><topic>Training</topic><topic>Vectors</topic><toplevel>online_resources</toplevel><creatorcontrib>Dahl, George E.</creatorcontrib><creatorcontrib>Stokes, Jack W.</creatorcontrib><creatorcontrib>Li Deng</creatorcontrib><creatorcontrib>Dong Yu</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Dahl, George E.</au><au>Stokes, Jack W.</au><au>Li Deng</au><au>Dong Yu</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Large-scale malware classification using random projections and neural networks</atitle><btitle>2013 IEEE International Conference on Acoustics, Speech and Signal Processing</btitle><stitle>ICASSP</stitle><date>2013-05</date><risdate>2013</risdate><spage>3422</spage><epage>3426</epage><pages>3422-3426</pages><issn>1520-6149</issn><eissn>2379-190X</eissn><eisbn>1479903566</eisbn><eisbn>9781479903566</eisbn><abstract>Automatically generated malware is a significant problem for computer users. Analysts are able to manually investigate a small number of unknown files, but the best large-scale defense for detecting malware is automated malware classification. Malware classifiers often use sparse binary features, and the number of potential features can be on the order of tens or hundreds of millions. Feature selection reduces the number of features to a manageable number for training simpler algorithms such as logistic regression, but this number is still too large for more complex algorithms such as neural networks. To overcome this problem, we used random projections to further reduce the dimensionality of the original input space. Using this architecture, we train several very large-scale neural network systems with over 2.6 million labeled samples thereby achieving classification results with a two-class error rate of 0.49% for a single neural network and 0.42% for an ensemble of neural networks.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2013.6638293</doi><tpages>5</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1520-6149
ispartof 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, p.3422-3426
issn 1520-6149
2379-190X
language eng
recordid cdi_ieee_primary_6638293
source IEEE Xplore All Conference Series
subjects Computers
Error analysis
Logistics
Malware
Malware Classification
Neural Network
Neural networks
Random Projections
Training
Vectors
title Large-scale malware classification using random projections and neural networks
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T23%3A45%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Large-scale%20malware%20classification%20using%20random%20projections%20and%20neural%20networks&rft.btitle=2013%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing&rft.au=Dahl,%20George%20E.&rft.date=2013-05&rft.spage=3422&rft.epage=3426&rft.pages=3422-3426&rft.issn=1520-6149&rft.eissn=2379-190X&rft_id=info:doi/10.1109/ICASSP.2013.6638293&rft.eisbn=1479903566&rft.eisbn_list=9781479903566&rft_dat=%3Cieee_CHZPO%3E6638293%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c336t-c646f8be0c919daab40f90a2c6c063faf988b46601e16d6535abd6dd022d830b3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6638293&rfr_iscdi=true