Loading…

Malicious code detection in android: the role of sequence characteristics and disassembling methods

The acceptance and widespread use of the Android operating system drew the attention of both legitimate developers and malware authors, which resulted in a significant number of benign and malicious applications available on various online markets. Since the signature-based methods fall short for de...

Full description

Saved in:
Bibliographic Details
Published in:International journal of information security 2023-02, Vol.22 (1), p.107-118
Main Authors: Balikcioglu, Pinar G., Sirlanci, Melih, A. Kucuk, Ozge, Ulukapi, Bulut, Turkmen, Ramazan K., Acarturk, Cengiz
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c363t-3336488be8f4ce106054b56a29a92f0f33fbab47bc2c1bbfd755fda78b5ded2a3
cites cdi_FETCH-LOGICAL-c363t-3336488be8f4ce106054b56a29a92f0f33fbab47bc2c1bbfd755fda78b5ded2a3
container_end_page 118
container_issue 1
container_start_page 107
container_title International journal of information security
container_volume 22
creator Balikcioglu, Pinar G.
Sirlanci, Melih
A. Kucuk, Ozge
Ulukapi, Bulut
Turkmen, Ramazan K.
Acarturk, Cengiz
description The acceptance and widespread use of the Android operating system drew the attention of both legitimate developers and malware authors, which resulted in a significant number of benign and malicious applications available on various online markets. Since the signature-based methods fall short for detecting malicious software effectively considering the vast number of applications, machine learning techniques in this field have also become widespread. In this context, stating the acquired accuracy values in the contingency tables in malware detection studies has become a popular and efficient method and enabled researchers to evaluate their methodologies comparatively. In this study, we wanted to investigate and emphasize the factors that may affect the accuracy values of the models managed by researchers, particularly the disassembly method and the input data characteristics. Firstly, we developed a model that tackles the malware detection problem from a Natural Language Processing (NLP) perspective using Long Short-Term Memory (LSTM). Then, we experimented with different base units (instruction, basic block, method, and class) and representations of source code obtained from three commonly used disassembling tools (JEB, IDA, and Apktool) and examined the results. Our findings exhibit that the disassembly method and different input representations affect the model results. More specifically, the datasets collected by the Apktool achieved better results compared to the other two disassemblers.
doi_str_mv 10.1007/s10207-022-00626-2
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2766574831</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2766574831</sourcerecordid><originalsourceid>FETCH-LOGICAL-c363t-3336488be8f4ce106054b56a29a92f0f33fbab47bc2c1bbfd755fda78b5ded2a3</originalsourceid><addsrcrecordid>eNp9kLtOAzEQRS0EEiHwA1SWqBf8WNsbOhTxkoJooLb8GCeOknWwnYK_Z8Mi6KhminPvjA5Cl5RcU0LUTaGEEdUQxhpCJJMNO0ITKqloBFPk-HeX7BSdlbImhFEyoxPkXswmupj2BbvkAXuo4GpMPY49Nr3PKfpbXFeAc9oATgEX-NhD7wC7lcnGVcix1OjKgcY-FlMKbO0m9ku8hbpKvpyjk2A2BS5-5hS9P9y_zZ-axevj8_xu0TgueW0457LtOgtdaB1QIolorZCGzcyMBRI4D9bYVlnHHLU2eCVE8EZ1VnjwzPApuhp7dzkNP5aq12mf--GkZkpKodqO04FiI-VyKiVD0LsctyZ_akr0QaYeZepBpv6WqdkQ4mOoDHC_hPxX_U_qCwcJeQ8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2766574831</pqid></control><display><type>article</type><title>Malicious code detection in android: the role of sequence characteristics and disassembling methods</title><source>Criminology Collection</source><source>EBSCOhost Business Source Ultimate</source><source>Social Science Premium Collection</source><source>ABI/INFORM Global</source><source>Springer Link</source><creator>Balikcioglu, Pinar G. ; Sirlanci, Melih ; A. Kucuk, Ozge ; Ulukapi, Bulut ; Turkmen, Ramazan K. ; Acarturk, Cengiz</creator><creatorcontrib>Balikcioglu, Pinar G. ; Sirlanci, Melih ; A. Kucuk, Ozge ; Ulukapi, Bulut ; Turkmen, Ramazan K. ; Acarturk, Cengiz</creatorcontrib><description>The acceptance and widespread use of the Android operating system drew the attention of both legitimate developers and malware authors, which resulted in a significant number of benign and malicious applications available on various online markets. Since the signature-based methods fall short for detecting malicious software effectively considering the vast number of applications, machine learning techniques in this field have also become widespread. In this context, stating the acquired accuracy values in the contingency tables in malware detection studies has become a popular and efficient method and enabled researchers to evaluate their methodologies comparatively. In this study, we wanted to investigate and emphasize the factors that may affect the accuracy values of the models managed by researchers, particularly the disassembly method and the input data characteristics. Firstly, we developed a model that tackles the malware detection problem from a Natural Language Processing (NLP) perspective using Long Short-Term Memory (LSTM). Then, we experimented with different base units (instruction, basic block, method, and class) and representations of source code obtained from three commonly used disassembling tools (JEB, IDA, and Apktool) and examined the results. Our findings exhibit that the disassembly method and different input representations affect the model results. More specifically, the datasets collected by the Apktool achieved better results compared to the other two disassemblers.</description><identifier>ISSN: 1615-5262</identifier><identifier>EISSN: 1615-5270</identifier><identifier>DOI: 10.1007/s10207-022-00626-2</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Accuracy ; Coding and Information Theory ; Communications Engineering ; Computer Communication Networks ; Computer Science ; Contingency ; Cryptology ; Cybersecurity ; Dismantling ; Machine learning ; Malware ; Management of Computing and Information Systems ; Mobile operating systems ; Natural language processing ; Networks ; Operating Systems ; Regular Contribution ; Representations ; Source code</subject><ispartof>International journal of information security, 2023-02, Vol.22 (1), p.107-118</ispartof><rights>The Author(s) 2022</rights><rights>The Author(s) 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c363t-3336488be8f4ce106054b56a29a92f0f33fbab47bc2c1bbfd755fda78b5ded2a3</citedby><cites>FETCH-LOGICAL-c363t-3336488be8f4ce106054b56a29a92f0f33fbab47bc2c1bbfd755fda78b5ded2a3</cites><orcidid>0000-0002-5443-6868</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2766574831/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2766574831?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,776,780,11667,21355,21373,27901,27902,33588,33746,36037,43709,43790,44339,73964,74053,74638</link.rule.ids></links><search><creatorcontrib>Balikcioglu, Pinar G.</creatorcontrib><creatorcontrib>Sirlanci, Melih</creatorcontrib><creatorcontrib>A. Kucuk, Ozge</creatorcontrib><creatorcontrib>Ulukapi, Bulut</creatorcontrib><creatorcontrib>Turkmen, Ramazan K.</creatorcontrib><creatorcontrib>Acarturk, Cengiz</creatorcontrib><title>Malicious code detection in android: the role of sequence characteristics and disassembling methods</title><title>International journal of information security</title><addtitle>Int. J. Inf. Secur</addtitle><description>The acceptance and widespread use of the Android operating system drew the attention of both legitimate developers and malware authors, which resulted in a significant number of benign and malicious applications available on various online markets. Since the signature-based methods fall short for detecting malicious software effectively considering the vast number of applications, machine learning techniques in this field have also become widespread. In this context, stating the acquired accuracy values in the contingency tables in malware detection studies has become a popular and efficient method and enabled researchers to evaluate their methodologies comparatively. In this study, we wanted to investigate and emphasize the factors that may affect the accuracy values of the models managed by researchers, particularly the disassembly method and the input data characteristics. Firstly, we developed a model that tackles the malware detection problem from a Natural Language Processing (NLP) perspective using Long Short-Term Memory (LSTM). Then, we experimented with different base units (instruction, basic block, method, and class) and representations of source code obtained from three commonly used disassembling tools (JEB, IDA, and Apktool) and examined the results. Our findings exhibit that the disassembly method and different input representations affect the model results. More specifically, the datasets collected by the Apktool achieved better results compared to the other two disassemblers.</description><subject>Accuracy</subject><subject>Coding and Information Theory</subject><subject>Communications Engineering</subject><subject>Computer Communication Networks</subject><subject>Computer Science</subject><subject>Contingency</subject><subject>Cryptology</subject><subject>Cybersecurity</subject><subject>Dismantling</subject><subject>Machine learning</subject><subject>Malware</subject><subject>Management of Computing and Information Systems</subject><subject>Mobile operating systems</subject><subject>Natural language processing</subject><subject>Networks</subject><subject>Operating Systems</subject><subject>Regular Contribution</subject><subject>Representations</subject><subject>Source code</subject><issn>1615-5262</issn><issn>1615-5270</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ALSLI</sourceid><sourceid>BGRYB</sourceid><sourceid>M0C</sourceid><sourceid>M0O</sourceid><recordid>eNp9kLtOAzEQRS0EEiHwA1SWqBf8WNsbOhTxkoJooLb8GCeOknWwnYK_Z8Mi6KhminPvjA5Cl5RcU0LUTaGEEdUQxhpCJJMNO0ITKqloBFPk-HeX7BSdlbImhFEyoxPkXswmupj2BbvkAXuo4GpMPY49Nr3PKfpbXFeAc9oATgEX-NhD7wC7lcnGVcix1OjKgcY-FlMKbO0m9ku8hbpKvpyjk2A2BS5-5hS9P9y_zZ-axevj8_xu0TgueW0457LtOgtdaB1QIolorZCGzcyMBRI4D9bYVlnHHLU2eCVE8EZ1VnjwzPApuhp7dzkNP5aq12mf--GkZkpKodqO04FiI-VyKiVD0LsctyZ_akr0QaYeZepBpv6WqdkQ4mOoDHC_hPxX_U_qCwcJeQ8</recordid><startdate>20230201</startdate><enddate>20230201</enddate><creator>Balikcioglu, Pinar G.</creator><creator>Sirlanci, Melih</creator><creator>A. Kucuk, Ozge</creator><creator>Ulukapi, Bulut</creator><creator>Turkmen, Ramazan K.</creator><creator>Acarturk, Cengiz</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>0-V</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>88F</scope><scope>8AL</scope><scope>8AM</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>BGRYB</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>K7.</scope><scope>L.-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M0O</scope><scope>M1Q</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0002-5443-6868</orcidid></search><sort><creationdate>20230201</creationdate><title>Malicious code detection in android: the role of sequence characteristics and disassembling methods</title><author>Balikcioglu, Pinar G. ; Sirlanci, Melih ; A. Kucuk, Ozge ; Ulukapi, Bulut ; Turkmen, Ramazan K. ; Acarturk, Cengiz</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c363t-3336488be8f4ce106054b56a29a92f0f33fbab47bc2c1bbfd755fda78b5ded2a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Accuracy</topic><topic>Coding and Information Theory</topic><topic>Communications Engineering</topic><topic>Computer Communication Networks</topic><topic>Computer Science</topic><topic>Contingency</topic><topic>Cryptology</topic><topic>Cybersecurity</topic><topic>Dismantling</topic><topic>Machine learning</topic><topic>Malware</topic><topic>Management of Computing and Information Systems</topic><topic>Mobile operating systems</topic><topic>Natural language processing</topic><topic>Networks</topic><topic>Operating Systems</topic><topic>Regular Contribution</topic><topic>Representations</topic><topic>Source code</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Balikcioglu, Pinar G.</creatorcontrib><creatorcontrib>Sirlanci, Melih</creatorcontrib><creatorcontrib>A. Kucuk, Ozge</creatorcontrib><creatorcontrib>Ulukapi, Bulut</creatorcontrib><creatorcontrib>Turkmen, Ramazan K.</creatorcontrib><creatorcontrib>Acarturk, Cengiz</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>CrossRef</collection><collection>ProQuest Social Sciences Premium Collection</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Military Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Criminal Justice Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Social Science Premium Collection</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>Criminology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Criminal Justice (Alumni)</collection><collection>ABI/INFORM Professional Advanced</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Criminal Justice Database</collection><collection>Military Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>One Business (ProQuest)</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>International journal of information security</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Balikcioglu, Pinar G.</au><au>Sirlanci, Melih</au><au>A. Kucuk, Ozge</au><au>Ulukapi, Bulut</au><au>Turkmen, Ramazan K.</au><au>Acarturk, Cengiz</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Malicious code detection in android: the role of sequence characteristics and disassembling methods</atitle><jtitle>International journal of information security</jtitle><stitle>Int. J. Inf. Secur</stitle><date>2023-02-01</date><risdate>2023</risdate><volume>22</volume><issue>1</issue><spage>107</spage><epage>118</epage><pages>107-118</pages><issn>1615-5262</issn><eissn>1615-5270</eissn><abstract>The acceptance and widespread use of the Android operating system drew the attention of both legitimate developers and malware authors, which resulted in a significant number of benign and malicious applications available on various online markets. Since the signature-based methods fall short for detecting malicious software effectively considering the vast number of applications, machine learning techniques in this field have also become widespread. In this context, stating the acquired accuracy values in the contingency tables in malware detection studies has become a popular and efficient method and enabled researchers to evaluate their methodologies comparatively. In this study, we wanted to investigate and emphasize the factors that may affect the accuracy values of the models managed by researchers, particularly the disassembly method and the input data characteristics. Firstly, we developed a model that tackles the malware detection problem from a Natural Language Processing (NLP) perspective using Long Short-Term Memory (LSTM). Then, we experimented with different base units (instruction, basic block, method, and class) and representations of source code obtained from three commonly used disassembling tools (JEB, IDA, and Apktool) and examined the results. Our findings exhibit that the disassembly method and different input representations affect the model results. More specifically, the datasets collected by the Apktool achieved better results compared to the other two disassemblers.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s10207-022-00626-2</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-5443-6868</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1615-5262
ispartof International journal of information security, 2023-02, Vol.22 (1), p.107-118
issn 1615-5262
1615-5270
language eng
recordid cdi_proquest_journals_2766574831
source Criminology Collection; EBSCOhost Business Source Ultimate; Social Science Premium Collection; ABI/INFORM Global; Springer Link
subjects Accuracy
Coding and Information Theory
Communications Engineering
Computer Communication Networks
Computer Science
Contingency
Cryptology
Cybersecurity
Dismantling
Machine learning
Malware
Management of Computing and Information Systems
Mobile operating systems
Natural language processing
Networks
Operating Systems
Regular Contribution
Representations
Source code
title Malicious code detection in android: the role of sequence characteristics and disassembling methods
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-12T14%3A46%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Malicious%20code%20detection%20in%20android:%20the%20role%20of%20sequence%20characteristics%20and%20disassembling%20methods&rft.jtitle=International%20journal%20of%20information%20security&rft.au=Balikcioglu,%20Pinar%20G.&rft.date=2023-02-01&rft.volume=22&rft.issue=1&rft.spage=107&rft.epage=118&rft.pages=107-118&rft.issn=1615-5262&rft.eissn=1615-5270&rft_id=info:doi/10.1007/s10207-022-00626-2&rft_dat=%3Cproquest_cross%3E2766574831%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c363t-3336488be8f4ce106054b56a29a92f0f33fbab47bc2c1bbfd755fda78b5ded2a3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2766574831&rft_id=info:pmid/&rfr_iscdi=true