Loading…
Large-scale malware classification using random projections and neural networks
Automatically generated malware is a significant problem for computer users. Analysts are able to manually investigate a small number of unknown files, but the best large-scale defense for detecting malware is automated malware classification. Malware classifiers often use sparse binary features, an...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Citations: | Items that cite this one |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c336t-c646f8be0c919daab40f90a2c6c063faf988b46601e16d6535abd6dd022d830b3 |
---|---|
cites | |
container_end_page | 3426 |
container_issue | |
container_start_page | 3422 |
container_title | |
container_volume | |
creator | Dahl, George E. Stokes, Jack W. Li Deng Dong Yu |
description | Automatically generated malware is a significant problem for computer users. Analysts are able to manually investigate a small number of unknown files, but the best large-scale defense for detecting malware is automated malware classification. Malware classifiers often use sparse binary features, and the number of potential features can be on the order of tens or hundreds of millions. Feature selection reduces the number of features to a manageable number for training simpler algorithms such as logistic regression, but this number is still too large for more complex algorithms such as neural networks. To overcome this problem, we used random projections to further reduce the dimensionality of the original input space. Using this architecture, we train several very large-scale neural network systems with over 2.6 million labeled samples thereby achieving classification results with a two-class error rate of 0.49% for a single neural network and 0.42% for an ensemble of neural networks. |
doi_str_mv | 10.1109/ICASSP.2013.6638293 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_6638293</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6638293</ieee_id><sourcerecordid>6638293</sourcerecordid><originalsourceid>FETCH-LOGICAL-c336t-c646f8be0c919daab40f90a2c6c063faf988b46601e16d6535abd6dd022d830b3</originalsourceid><addsrcrecordid>eNotkF1LwzAUhqMo2E1_wW7yBzpPkvY0uZShTihMmIJ34zRJR2c_RtIx_PdW3NUDz8XLw8vYQsBSCDCPb6un7fZ9KUGoJaLS0qgrNhNZYQyoHPGaJVIVJhUGvm5YInIJKYrM3LFZjAcA0EWmE7YpKex9Gi21nnfUnil4bluKsakbS2Mz9PwUm37PA_Vu6PgxDAdv_3zkk-G9PwVqJ4znIXzHe3ZbUxv9w4Vz9vny_LFap-XmdUouU6sUjqnFDGtdebBGGEdUZVAbIGnRAqqaaqN1lSGC8AId5iqnyqFzIKXTCio1Z4v_3cZ7vzuGpqPws7scoX4BEqJSOg</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Large-scale malware classification using random projections and neural networks</title><source>IEEE Xplore All Conference Series</source><creator>Dahl, George E. ; Stokes, Jack W. ; Li Deng ; Dong Yu</creator><creatorcontrib>Dahl, George E. ; Stokes, Jack W. ; Li Deng ; Dong Yu</creatorcontrib><description>Automatically generated malware is a significant problem for computer users. Analysts are able to manually investigate a small number of unknown files, but the best large-scale defense for detecting malware is automated malware classification. Malware classifiers often use sparse binary features, and the number of potential features can be on the order of tens or hundreds of millions. Feature selection reduces the number of features to a manageable number for training simpler algorithms such as logistic regression, but this number is still too large for more complex algorithms such as neural networks. To overcome this problem, we used random projections to further reduce the dimensionality of the original input space. Using this architecture, we train several very large-scale neural network systems with over 2.6 million labeled samples thereby achieving classification results with a two-class error rate of 0.49% for a single neural network and 0.42% for an ensemble of neural networks.</description><identifier>ISSN: 1520-6149</identifier><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 1479903566</identifier><identifier>EISBN: 9781479903566</identifier><identifier>DOI: 10.1109/ICASSP.2013.6638293</identifier><language>eng</language><publisher>IEEE</publisher><subject>Computers ; Error analysis ; Logistics ; Malware ; Malware Classification ; Neural Network ; Neural networks ; Random Projections ; Training ; Vectors</subject><ispartof>2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, p.3422-3426</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c336t-c646f8be0c919daab40f90a2c6c063faf988b46601e16d6535abd6dd022d830b3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6638293$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,23930,23931,25140,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6638293$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Dahl, George E.</creatorcontrib><creatorcontrib>Stokes, Jack W.</creatorcontrib><creatorcontrib>Li Deng</creatorcontrib><creatorcontrib>Dong Yu</creatorcontrib><title>Large-scale malware classification using random projections and neural networks</title><title>2013 IEEE International Conference on Acoustics, Speech and Signal Processing</title><addtitle>ICASSP</addtitle><description>Automatically generated malware is a significant problem for computer users. Analysts are able to manually investigate a small number of unknown files, but the best large-scale defense for detecting malware is automated malware classification. Malware classifiers often use sparse binary features, and the number of potential features can be on the order of tens or hundreds of millions. Feature selection reduces the number of features to a manageable number for training simpler algorithms such as logistic regression, but this number is still too large for more complex algorithms such as neural networks. To overcome this problem, we used random projections to further reduce the dimensionality of the original input space. Using this architecture, we train several very large-scale neural network systems with over 2.6 million labeled samples thereby achieving classification results with a two-class error rate of 0.49% for a single neural network and 0.42% for an ensemble of neural networks.</description><subject>Computers</subject><subject>Error analysis</subject><subject>Logistics</subject><subject>Malware</subject><subject>Malware Classification</subject><subject>Neural Network</subject><subject>Neural networks</subject><subject>Random Projections</subject><subject>Training</subject><subject>Vectors</subject><issn>1520-6149</issn><issn>2379-190X</issn><isbn>1479903566</isbn><isbn>9781479903566</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2013</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotkF1LwzAUhqMo2E1_wW7yBzpPkvY0uZShTihMmIJ34zRJR2c_RtIx_PdW3NUDz8XLw8vYQsBSCDCPb6un7fZ9KUGoJaLS0qgrNhNZYQyoHPGaJVIVJhUGvm5YInIJKYrM3LFZjAcA0EWmE7YpKex9Gi21nnfUnil4bluKsakbS2Mz9PwUm37PA_Vu6PgxDAdv_3zkk-G9PwVqJ4znIXzHe3ZbUxv9w4Vz9vny_LFap-XmdUouU6sUjqnFDGtdebBGGEdUZVAbIGnRAqqaaqN1lSGC8AId5iqnyqFzIKXTCio1Z4v_3cZ7vzuGpqPws7scoX4BEqJSOg</recordid><startdate>201305</startdate><enddate>201305</enddate><creator>Dahl, George E.</creator><creator>Stokes, Jack W.</creator><creator>Li Deng</creator><creator>Dong Yu</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201305</creationdate><title>Large-scale malware classification using random projections and neural networks</title><author>Dahl, George E. ; Stokes, Jack W. ; Li Deng ; Dong Yu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c336t-c646f8be0c919daab40f90a2c6c063faf988b46601e16d6535abd6dd022d830b3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Computers</topic><topic>Error analysis</topic><topic>Logistics</topic><topic>Malware</topic><topic>Malware Classification</topic><topic>Neural Network</topic><topic>Neural networks</topic><topic>Random Projections</topic><topic>Training</topic><topic>Vectors</topic><toplevel>online_resources</toplevel><creatorcontrib>Dahl, George E.</creatorcontrib><creatorcontrib>Stokes, Jack W.</creatorcontrib><creatorcontrib>Li Deng</creatorcontrib><creatorcontrib>Dong Yu</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Dahl, George E.</au><au>Stokes, Jack W.</au><au>Li Deng</au><au>Dong Yu</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Large-scale malware classification using random projections and neural networks</atitle><btitle>2013 IEEE International Conference on Acoustics, Speech and Signal Processing</btitle><stitle>ICASSP</stitle><date>2013-05</date><risdate>2013</risdate><spage>3422</spage><epage>3426</epage><pages>3422-3426</pages><issn>1520-6149</issn><eissn>2379-190X</eissn><eisbn>1479903566</eisbn><eisbn>9781479903566</eisbn><abstract>Automatically generated malware is a significant problem for computer users. Analysts are able to manually investigate a small number of unknown files, but the best large-scale defense for detecting malware is automated malware classification. Malware classifiers often use sparse binary features, and the number of potential features can be on the order of tens or hundreds of millions. Feature selection reduces the number of features to a manageable number for training simpler algorithms such as logistic regression, but this number is still too large for more complex algorithms such as neural networks. To overcome this problem, we used random projections to further reduce the dimensionality of the original input space. Using this architecture, we train several very large-scale neural network systems with over 2.6 million labeled samples thereby achieving classification results with a two-class error rate of 0.49% for a single neural network and 0.42% for an ensemble of neural networks.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2013.6638293</doi><tpages>5</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1520-6149 |
ispartof | 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, p.3422-3426 |
issn | 1520-6149 2379-190X |
language | eng |
recordid | cdi_ieee_primary_6638293 |
source | IEEE Xplore All Conference Series |
subjects | Computers Error analysis Logistics Malware Malware Classification Neural Network Neural networks Random Projections Training Vectors |
title | Large-scale malware classification using random projections and neural networks |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T23%3A45%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Large-scale%20malware%20classification%20using%20random%20projections%20and%20neural%20networks&rft.btitle=2013%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing&rft.au=Dahl,%20George%20E.&rft.date=2013-05&rft.spage=3422&rft.epage=3426&rft.pages=3422-3426&rft.issn=1520-6149&rft.eissn=2379-190X&rft_id=info:doi/10.1109/ICASSP.2013.6638293&rft.eisbn=1479903566&rft.eisbn_list=9781479903566&rft_dat=%3Cieee_CHZPO%3E6638293%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c336t-c646f8be0c919daab40f90a2c6c063faf988b46601e16d6535abd6dd022d830b3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6638293&rfr_iscdi=true |