Loading…
Malicious code detection in android: the role of sequence characteristics and disassembling methods
The acceptance and widespread use of the Android operating system drew the attention of both legitimate developers and malware authors, which resulted in a significant number of benign and malicious applications available on various online markets. Since the signature-based methods fall short for de...
Saved in:
Published in: | International journal of information security 2023-02, Vol.22 (1), p.107-118 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c363t-3336488be8f4ce106054b56a29a92f0f33fbab47bc2c1bbfd755fda78b5ded2a3 |
---|---|
cites | cdi_FETCH-LOGICAL-c363t-3336488be8f4ce106054b56a29a92f0f33fbab47bc2c1bbfd755fda78b5ded2a3 |
container_end_page | 118 |
container_issue | 1 |
container_start_page | 107 |
container_title | International journal of information security |
container_volume | 22 |
creator | Balikcioglu, Pinar G. Sirlanci, Melih A. Kucuk, Ozge Ulukapi, Bulut Turkmen, Ramazan K. Acarturk, Cengiz |
description | The acceptance and widespread use of the Android operating system drew the attention of both legitimate developers and malware authors, which resulted in a significant number of benign and malicious applications available on various online markets. Since the signature-based methods fall short for detecting malicious software effectively considering the vast number of applications, machine learning techniques in this field have also become widespread. In this context, stating the acquired accuracy values in the contingency tables in malware detection studies has become a popular and efficient method and enabled researchers to evaluate their methodologies comparatively. In this study, we wanted to investigate and emphasize the factors that may affect the accuracy values of the models managed by researchers, particularly the disassembly method and the input data characteristics. Firstly, we developed a model that tackles the malware detection problem from a Natural Language Processing (NLP) perspective using Long Short-Term Memory (LSTM). Then, we experimented with different base units (instruction, basic block, method, and class) and representations of source code obtained from three commonly used disassembling tools (JEB, IDA, and Apktool) and examined the results. Our findings exhibit that the disassembly method and different input representations affect the model results. More specifically, the datasets collected by the Apktool achieved better results compared to the other two disassemblers. |
doi_str_mv | 10.1007/s10207-022-00626-2 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2766574831</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2766574831</sourcerecordid><originalsourceid>FETCH-LOGICAL-c363t-3336488be8f4ce106054b56a29a92f0f33fbab47bc2c1bbfd755fda78b5ded2a3</originalsourceid><addsrcrecordid>eNp9kLtOAzEQRS0EEiHwA1SWqBf8WNsbOhTxkoJooLb8GCeOknWwnYK_Z8Mi6KhminPvjA5Cl5RcU0LUTaGEEdUQxhpCJJMNO0ITKqloBFPk-HeX7BSdlbImhFEyoxPkXswmupj2BbvkAXuo4GpMPY49Nr3PKfpbXFeAc9oATgEX-NhD7wC7lcnGVcix1OjKgcY-FlMKbO0m9ku8hbpKvpyjk2A2BS5-5hS9P9y_zZ-axevj8_xu0TgueW0457LtOgtdaB1QIolorZCGzcyMBRI4D9bYVlnHHLU2eCVE8EZ1VnjwzPApuhp7dzkNP5aq12mf--GkZkpKodqO04FiI-VyKiVD0LsctyZ_akr0QaYeZepBpv6WqdkQ4mOoDHC_hPxX_U_qCwcJeQ8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2766574831</pqid></control><display><type>article</type><title>Malicious code detection in android: the role of sequence characteristics and disassembling methods</title><source>Criminology Collection</source><source>EBSCOhost Business Source Ultimate</source><source>Social Science Premium Collection</source><source>ABI/INFORM Global</source><source>Springer Link</source><creator>Balikcioglu, Pinar G. ; Sirlanci, Melih ; A. Kucuk, Ozge ; Ulukapi, Bulut ; Turkmen, Ramazan K. ; Acarturk, Cengiz</creator><creatorcontrib>Balikcioglu, Pinar G. ; Sirlanci, Melih ; A. Kucuk, Ozge ; Ulukapi, Bulut ; Turkmen, Ramazan K. ; Acarturk, Cengiz</creatorcontrib><description>The acceptance and widespread use of the Android operating system drew the attention of both legitimate developers and malware authors, which resulted in a significant number of benign and malicious applications available on various online markets. Since the signature-based methods fall short for detecting malicious software effectively considering the vast number of applications, machine learning techniques in this field have also become widespread. In this context, stating the acquired accuracy values in the contingency tables in malware detection studies has become a popular and efficient method and enabled researchers to evaluate their methodologies comparatively. In this study, we wanted to investigate and emphasize the factors that may affect the accuracy values of the models managed by researchers, particularly the disassembly method and the input data characteristics. Firstly, we developed a model that tackles the malware detection problem from a Natural Language Processing (NLP) perspective using Long Short-Term Memory (LSTM). Then, we experimented with different base units (instruction, basic block, method, and class) and representations of source code obtained from three commonly used disassembling tools (JEB, IDA, and Apktool) and examined the results. Our findings exhibit that the disassembly method and different input representations affect the model results. More specifically, the datasets collected by the Apktool achieved better results compared to the other two disassemblers.</description><identifier>ISSN: 1615-5262</identifier><identifier>EISSN: 1615-5270</identifier><identifier>DOI: 10.1007/s10207-022-00626-2</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Accuracy ; Coding and Information Theory ; Communications Engineering ; Computer Communication Networks ; Computer Science ; Contingency ; Cryptology ; Cybersecurity ; Dismantling ; Machine learning ; Malware ; Management of Computing and Information Systems ; Mobile operating systems ; Natural language processing ; Networks ; Operating Systems ; Regular Contribution ; Representations ; Source code</subject><ispartof>International journal of information security, 2023-02, Vol.22 (1), p.107-118</ispartof><rights>The Author(s) 2022</rights><rights>The Author(s) 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c363t-3336488be8f4ce106054b56a29a92f0f33fbab47bc2c1bbfd755fda78b5ded2a3</citedby><cites>FETCH-LOGICAL-c363t-3336488be8f4ce106054b56a29a92f0f33fbab47bc2c1bbfd755fda78b5ded2a3</cites><orcidid>0000-0002-5443-6868</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2766574831/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2766574831?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,776,780,11667,21355,21373,27901,27902,33588,33746,36037,43709,43790,44339,73964,74053,74638</link.rule.ids></links><search><creatorcontrib>Balikcioglu, Pinar G.</creatorcontrib><creatorcontrib>Sirlanci, Melih</creatorcontrib><creatorcontrib>A. Kucuk, Ozge</creatorcontrib><creatorcontrib>Ulukapi, Bulut</creatorcontrib><creatorcontrib>Turkmen, Ramazan K.</creatorcontrib><creatorcontrib>Acarturk, Cengiz</creatorcontrib><title>Malicious code detection in android: the role of sequence characteristics and disassembling methods</title><title>International journal of information security</title><addtitle>Int. J. Inf. Secur</addtitle><description>The acceptance and widespread use of the Android operating system drew the attention of both legitimate developers and malware authors, which resulted in a significant number of benign and malicious applications available on various online markets. Since the signature-based methods fall short for detecting malicious software effectively considering the vast number of applications, machine learning techniques in this field have also become widespread. In this context, stating the acquired accuracy values in the contingency tables in malware detection studies has become a popular and efficient method and enabled researchers to evaluate their methodologies comparatively. In this study, we wanted to investigate and emphasize the factors that may affect the accuracy values of the models managed by researchers, particularly the disassembly method and the input data characteristics. Firstly, we developed a model that tackles the malware detection problem from a Natural Language Processing (NLP) perspective using Long Short-Term Memory (LSTM). Then, we experimented with different base units (instruction, basic block, method, and class) and representations of source code obtained from three commonly used disassembling tools (JEB, IDA, and Apktool) and examined the results. Our findings exhibit that the disassembly method and different input representations affect the model results. More specifically, the datasets collected by the Apktool achieved better results compared to the other two disassemblers.</description><subject>Accuracy</subject><subject>Coding and Information Theory</subject><subject>Communications Engineering</subject><subject>Computer Communication Networks</subject><subject>Computer Science</subject><subject>Contingency</subject><subject>Cryptology</subject><subject>Cybersecurity</subject><subject>Dismantling</subject><subject>Machine learning</subject><subject>Malware</subject><subject>Management of Computing and Information Systems</subject><subject>Mobile operating systems</subject><subject>Natural language processing</subject><subject>Networks</subject><subject>Operating Systems</subject><subject>Regular Contribution</subject><subject>Representations</subject><subject>Source code</subject><issn>1615-5262</issn><issn>1615-5270</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ALSLI</sourceid><sourceid>BGRYB</sourceid><sourceid>M0C</sourceid><sourceid>M0O</sourceid><recordid>eNp9kLtOAzEQRS0EEiHwA1SWqBf8WNsbOhTxkoJooLb8GCeOknWwnYK_Z8Mi6KhminPvjA5Cl5RcU0LUTaGEEdUQxhpCJJMNO0ITKqloBFPk-HeX7BSdlbImhFEyoxPkXswmupj2BbvkAXuo4GpMPY49Nr3PKfpbXFeAc9oATgEX-NhD7wC7lcnGVcix1OjKgcY-FlMKbO0m9ku8hbpKvpyjk2A2BS5-5hS9P9y_zZ-axevj8_xu0TgueW0457LtOgtdaB1QIolorZCGzcyMBRI4D9bYVlnHHLU2eCVE8EZ1VnjwzPApuhp7dzkNP5aq12mf--GkZkpKodqO04FiI-VyKiVD0LsctyZ_akr0QaYeZepBpv6WqdkQ4mOoDHC_hPxX_U_qCwcJeQ8</recordid><startdate>20230201</startdate><enddate>20230201</enddate><creator>Balikcioglu, Pinar G.</creator><creator>Sirlanci, Melih</creator><creator>A. Kucuk, Ozge</creator><creator>Ulukapi, Bulut</creator><creator>Turkmen, Ramazan K.</creator><creator>Acarturk, Cengiz</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>0-V</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>88F</scope><scope>8AL</scope><scope>8AM</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>BGRYB</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>K7.</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M0O</scope><scope>M1Q</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0002-5443-6868</orcidid></search><sort><creationdate>20230201</creationdate><title>Malicious code detection in android: the role of sequence characteristics and disassembling methods</title><author>Balikcioglu, Pinar G. ; Sirlanci, Melih ; A. Kucuk, Ozge ; Ulukapi, Bulut ; Turkmen, Ramazan K. ; Acarturk, Cengiz</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c363t-3336488be8f4ce106054b56a29a92f0f33fbab47bc2c1bbfd755fda78b5ded2a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Accuracy</topic><topic>Coding and Information Theory</topic><topic>Communications Engineering</topic><topic>Computer Communication Networks</topic><topic>Computer Science</topic><topic>Contingency</topic><topic>Cryptology</topic><topic>Cybersecurity</topic><topic>Dismantling</topic><topic>Machine learning</topic><topic>Malware</topic><topic>Management of Computing and Information Systems</topic><topic>Mobile operating systems</topic><topic>Natural language processing</topic><topic>Networks</topic><topic>Operating Systems</topic><topic>Regular Contribution</topic><topic>Representations</topic><topic>Source code</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Balikcioglu, Pinar G.</creatorcontrib><creatorcontrib>Sirlanci, Melih</creatorcontrib><creatorcontrib>A. Kucuk, Ozge</creatorcontrib><creatorcontrib>Ulukapi, Bulut</creatorcontrib><creatorcontrib>Turkmen, Ramazan K.</creatorcontrib><creatorcontrib>Acarturk, Cengiz</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>CrossRef</collection><collection>ProQuest Social Sciences Premium Collection</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Military Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Criminal Justice Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Social Science Premium Collection</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>Criminology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Criminal Justice (Alumni)</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Criminal Justice Database</collection><collection>Military Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>One Business (ProQuest)</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>International journal of information security</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Balikcioglu, Pinar G.</au><au>Sirlanci, Melih</au><au>A. Kucuk, Ozge</au><au>Ulukapi, Bulut</au><au>Turkmen, Ramazan K.</au><au>Acarturk, Cengiz</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Malicious code detection in android: the role of sequence characteristics and disassembling methods</atitle><jtitle>International journal of information security</jtitle><stitle>Int. J. Inf. Secur</stitle><date>2023-02-01</date><risdate>2023</risdate><volume>22</volume><issue>1</issue><spage>107</spage><epage>118</epage><pages>107-118</pages><issn>1615-5262</issn><eissn>1615-5270</eissn><abstract>The acceptance and widespread use of the Android operating system drew the attention of both legitimate developers and malware authors, which resulted in a significant number of benign and malicious applications available on various online markets. Since the signature-based methods fall short for detecting malicious software effectively considering the vast number of applications, machine learning techniques in this field have also become widespread. In this context, stating the acquired accuracy values in the contingency tables in malware detection studies has become a popular and efficient method and enabled researchers to evaluate their methodologies comparatively. In this study, we wanted to investigate and emphasize the factors that may affect the accuracy values of the models managed by researchers, particularly the disassembly method and the input data characteristics. Firstly, we developed a model that tackles the malware detection problem from a Natural Language Processing (NLP) perspective using Long Short-Term Memory (LSTM). Then, we experimented with different base units (instruction, basic block, method, and class) and representations of source code obtained from three commonly used disassembling tools (JEB, IDA, and Apktool) and examined the results. Our findings exhibit that the disassembly method and different input representations affect the model results. More specifically, the datasets collected by the Apktool achieved better results compared to the other two disassemblers.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s10207-022-00626-2</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-5443-6868</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1615-5262 |
ispartof | International journal of information security, 2023-02, Vol.22 (1), p.107-118 |
issn | 1615-5262 1615-5270 |
language | eng |
recordid | cdi_proquest_journals_2766574831 |
source | Criminology Collection; EBSCOhost Business Source Ultimate; Social Science Premium Collection; ABI/INFORM Global; Springer Link |
subjects | Accuracy Coding and Information Theory Communications Engineering Computer Communication Networks Computer Science Contingency Cryptology Cybersecurity Dismantling Machine learning Malware Management of Computing and Information Systems Mobile operating systems Natural language processing Networks Operating Systems Regular Contribution Representations Source code |
title | Malicious code detection in android: the role of sequence characteristics and disassembling methods |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-12T14%3A46%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Malicious%20code%20detection%20in%20android:%20the%20role%20of%20sequence%20characteristics%20and%20disassembling%20methods&rft.jtitle=International%20journal%20of%20information%20security&rft.au=Balikcioglu,%20Pinar%20G.&rft.date=2023-02-01&rft.volume=22&rft.issue=1&rft.spage=107&rft.epage=118&rft.pages=107-118&rft.issn=1615-5262&rft.eissn=1615-5270&rft_id=info:doi/10.1007/s10207-022-00626-2&rft_dat=%3Cproquest_cross%3E2766574831%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c363t-3336488be8f4ce106054b56a29a92f0f33fbab47bc2c1bbfd755fda78b5ded2a3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2766574831&rft_id=info:pmid/&rfr_iscdi=true |