Loading…

Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks

Detecting and preventing targeted email attacks is a long-standing challenge in cybersecurity research and practice. A typical targeted email attack capitalizes on a sophisticated email message to persuade a victim to run a specific, seemingly innocuous, action such as opening a link or an attachmen...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2021, Vol.9, p.87962-87971
Main Authors: Sun, Bo, Ban, Tao, Han, Chansu, Takahashi, Takeshi, Yoshioka, Katsunari, Takeuchi, Jun'ichi, Sarrafzadeh, Abdolhossein, Qiu, Meikang, Inoue, Daisuke
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c358t-747c15e238317111c677ee601accc4d5401a2e11496420cd8626bd181ae67fb93
container_end_page 87971
container_issue
container_start_page 87962
container_title IEEE access
container_volume 9
creator Sun, Bo
Ban, Tao
Han, Chansu
Takahashi, Takeshi
Yoshioka, Katsunari
Takeuchi, Jun'ichi
Sarrafzadeh, Abdolhossein
Qiu, Meikang
Inoue, Daisuke
description Detecting and preventing targeted email attacks is a long-standing challenge in cybersecurity research and practice. A typical targeted email attack capitalizes on a sophisticated email message to persuade a victim to run a specific, seemingly innocuous, action such as opening a link or an attachment and downloading and installing a software program. To successfully perform such an attack without being noticed afterwards, the attached exploit documents (hereafter referred to as decoy documents ), must contain content that is highly relevant to the target. An analysis of such decoy documents can provide crucial information for inferring and identifying the targeted or potentially harmed victims. In this paper, we propose an automatic approach that leverages natural language processing and machine learning to identify decoy documents that have a high chance of deceiving the targeted users. The experimental results show that the proposed method provides good prediction accuracy: the best result obtained on a collection of 200 Chinese decoy documents yielded an accuracy of 97.5%, an F-measure of 97.9% and a low FPR of 3.1%. The proposed scheme can be deployed at various access points to fortify the defense against targeted email attacks that threaten various targets.
doi_str_mv 10.1109/ACCESS.2021.3082000
format article
fullrecord <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_dd70700bf3a14f6ca626cd8f2b617564</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9435284</ieee_id><doaj_id>oai_doaj_org_article_dd70700bf3a14f6ca626cd8f2b617564</doaj_id><sourcerecordid>2544298128</sourcerecordid><originalsourceid>FETCH-LOGICAL-c358t-747c15e238317111c677ee601accc4d5401a2e11496420cd8626bd181ae67fb93</originalsourceid><addsrcrecordid>eNpNUcFuGyEUXFWp1CjJF-SClLMdHrDAHi3HSS256iGuekSYfWvj2IsDOFL-vmw3inJi3jAzDzFVdQt0CkCb-9l8vnh-njLKYMqpZpTSb9UlA9lMeM3lxRf8o7pJaV8EVBeqVpdVWuEbRrv1_Zb8sm7neyQrtLEfiDW6Xe9fz5hIDmTZYp99904e0OEp-zccUChzcOdjuUtkllJw3mZsyV-fd2Rt4xaHaXG0_kBmOVv3kq6r7509JLz5OK-qP4-L9fznZPX7aTmfrSaO1zpPlFAOamRcc1AA4KRSiJKCdc6JthYFMQQQjRSMulZLJjctaLAoVbdp-FW1HHPbYPfmFP3RxncTrDf_iRC3xsbs3QFN2yqqKN103ILopLMlqyR2bCNB1VKUrLsx6xTD8CHZ7MM59uX5htVCsEYD00XFR5WLIaWI3edWoGYoy4xlmaEs81FWcd2OLo-In45G8Jppwf8BWsGPqQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2544298128</pqid></control><display><type>article</type><title>Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks</title><source>IEEE Xplore Open Access Journals</source><creator>Sun, Bo ; Ban, Tao ; Han, Chansu ; Takahashi, Takeshi ; Yoshioka, Katsunari ; Takeuchi, Jun'ichi ; Sarrafzadeh, Abdolhossein ; Qiu, Meikang ; Inoue, Daisuke</creator><creatorcontrib>Sun, Bo ; Ban, Tao ; Han, Chansu ; Takahashi, Takeshi ; Yoshioka, Katsunari ; Takeuchi, Jun'ichi ; Sarrafzadeh, Abdolhossein ; Qiu, Meikang ; Inoue, Daisuke</creatorcontrib><description>Detecting and preventing targeted email attacks is a long-standing challenge in cybersecurity research and practice. A typical targeted email attack capitalizes on a sophisticated email message to persuade a victim to run a specific, seemingly innocuous, action such as opening a link or an attachment and downloading and installing a software program. To successfully perform such an attack without being noticed afterwards, the attached exploit documents (hereafter referred to as decoy documents ), must contain content that is highly relevant to the target. An analysis of such decoy documents can provide crucial information for inferring and identifying the targeted or potentially harmed victims. In this paper, we propose an automatic approach that leverages natural language processing and machine learning to identify decoy documents that have a high chance of deceiving the targeted users. The experimental results show that the proposed method provides good prediction accuracy: the best result obtained on a collection of 200 Chinese decoy documents yielded an accuracy of 97.5%, an F-measure of 97.9% and a low FPR of 3.1%. The proposed scheme can be deployed at various access points to fortify the defense against targeted email attacks that threaten various targets.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2021.3082000</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Cybersecurity ; decoy document ; Electronic mail ; Feature extraction ; Machine learning ; Metadata ; Natural language processing ; Phishing ; Software ; Targeted email attack ; Task analysis</subject><ispartof>IEEE access, 2021, Vol.9, p.87962-87971</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c358t-747c15e238317111c677ee601accc4d5401a2e11496420cd8626bd181ae67fb93</cites><orcidid>0000-0002-9616-3212 ; 0000-0002-1728-5300 ; 0000-0002-7822-3672 ; 0000-0002-5819-3082 ; 0000-0002-6477-7770</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9435284$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Sun, Bo</creatorcontrib><creatorcontrib>Ban, Tao</creatorcontrib><creatorcontrib>Han, Chansu</creatorcontrib><creatorcontrib>Takahashi, Takeshi</creatorcontrib><creatorcontrib>Yoshioka, Katsunari</creatorcontrib><creatorcontrib>Takeuchi, Jun'ichi</creatorcontrib><creatorcontrib>Sarrafzadeh, Abdolhossein</creatorcontrib><creatorcontrib>Qiu, Meikang</creatorcontrib><creatorcontrib>Inoue, Daisuke</creatorcontrib><title>Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks</title><title>IEEE access</title><addtitle>Access</addtitle><description>Detecting and preventing targeted email attacks is a long-standing challenge in cybersecurity research and practice. A typical targeted email attack capitalizes on a sophisticated email message to persuade a victim to run a specific, seemingly innocuous, action such as opening a link or an attachment and downloading and installing a software program. To successfully perform such an attack without being noticed afterwards, the attached exploit documents (hereafter referred to as decoy documents ), must contain content that is highly relevant to the target. An analysis of such decoy documents can provide crucial information for inferring and identifying the targeted or potentially harmed victims. In this paper, we propose an automatic approach that leverages natural language processing and machine learning to identify decoy documents that have a high chance of deceiving the targeted users. The experimental results show that the proposed method provides good prediction accuracy: the best result obtained on a collection of 200 Chinese decoy documents yielded an accuracy of 97.5%, an F-measure of 97.9% and a low FPR of 3.1%. The proposed scheme can be deployed at various access points to fortify the defense against targeted email attacks that threaten various targets.</description><subject>Cybersecurity</subject><subject>decoy document</subject><subject>Electronic mail</subject><subject>Feature extraction</subject><subject>Machine learning</subject><subject>Metadata</subject><subject>Natural language processing</subject><subject>Phishing</subject><subject>Software</subject><subject>Targeted email attack</subject><subject>Task analysis</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>DOA</sourceid><recordid>eNpNUcFuGyEUXFWp1CjJF-SClLMdHrDAHi3HSS256iGuekSYfWvj2IsDOFL-vmw3inJi3jAzDzFVdQt0CkCb-9l8vnh-njLKYMqpZpTSb9UlA9lMeM3lxRf8o7pJaV8EVBeqVpdVWuEbRrv1_Zb8sm7neyQrtLEfiDW6Xe9fz5hIDmTZYp99904e0OEp-zccUChzcOdjuUtkllJw3mZsyV-fd2Rt4xaHaXG0_kBmOVv3kq6r7509JLz5OK-qP4-L9fznZPX7aTmfrSaO1zpPlFAOamRcc1AA4KRSiJKCdc6JthYFMQQQjRSMulZLJjctaLAoVbdp-FW1HHPbYPfmFP3RxncTrDf_iRC3xsbs3QFN2yqqKN103ILopLMlqyR2bCNB1VKUrLsx6xTD8CHZ7MM59uX5htVCsEYD00XFR5WLIaWI3edWoGYoy4xlmaEs81FWcd2OLo-In45G8Jppwf8BWsGPqQ</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Sun, Bo</creator><creator>Ban, Tao</creator><creator>Han, Chansu</creator><creator>Takahashi, Takeshi</creator><creator>Yoshioka, Katsunari</creator><creator>Takeuchi, Jun'ichi</creator><creator>Sarrafzadeh, Abdolhossein</creator><creator>Qiu, Meikang</creator><creator>Inoue, Daisuke</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-9616-3212</orcidid><orcidid>https://orcid.org/0000-0002-1728-5300</orcidid><orcidid>https://orcid.org/0000-0002-7822-3672</orcidid><orcidid>https://orcid.org/0000-0002-5819-3082</orcidid><orcidid>https://orcid.org/0000-0002-6477-7770</orcidid></search><sort><creationdate>2021</creationdate><title>Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks</title><author>Sun, Bo ; Ban, Tao ; Han, Chansu ; Takahashi, Takeshi ; Yoshioka, Katsunari ; Takeuchi, Jun'ichi ; Sarrafzadeh, Abdolhossein ; Qiu, Meikang ; Inoue, Daisuke</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c358t-747c15e238317111c677ee601accc4d5401a2e11496420cd8626bd181ae67fb93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Cybersecurity</topic><topic>decoy document</topic><topic>Electronic mail</topic><topic>Feature extraction</topic><topic>Machine learning</topic><topic>Metadata</topic><topic>Natural language processing</topic><topic>Phishing</topic><topic>Software</topic><topic>Targeted email attack</topic><topic>Task analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sun, Bo</creatorcontrib><creatorcontrib>Ban, Tao</creatorcontrib><creatorcontrib>Han, Chansu</creatorcontrib><creatorcontrib>Takahashi, Takeshi</creatorcontrib><creatorcontrib>Yoshioka, Katsunari</creatorcontrib><creatorcontrib>Takeuchi, Jun'ichi</creatorcontrib><creatorcontrib>Sarrafzadeh, Abdolhossein</creatorcontrib><creatorcontrib>Qiu, Meikang</creatorcontrib><creatorcontrib>Inoue, Daisuke</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sun, Bo</au><au>Ban, Tao</au><au>Han, Chansu</au><au>Takahashi, Takeshi</au><au>Yoshioka, Katsunari</au><au>Takeuchi, Jun'ichi</au><au>Sarrafzadeh, Abdolhossein</au><au>Qiu, Meikang</au><au>Inoue, Daisuke</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2021</date><risdate>2021</risdate><volume>9</volume><spage>87962</spage><epage>87971</epage><pages>87962-87971</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Detecting and preventing targeted email attacks is a long-standing challenge in cybersecurity research and practice. A typical targeted email attack capitalizes on a sophisticated email message to persuade a victim to run a specific, seemingly innocuous, action such as opening a link or an attachment and downloading and installing a software program. To successfully perform such an attack without being noticed afterwards, the attached exploit documents (hereafter referred to as decoy documents ), must contain content that is highly relevant to the target. An analysis of such decoy documents can provide crucial information for inferring and identifying the targeted or potentially harmed victims. In this paper, we propose an automatic approach that leverages natural language processing and machine learning to identify decoy documents that have a high chance of deceiving the targeted users. The experimental results show that the proposed method provides good prediction accuracy: the best result obtained on a collection of 200 Chinese decoy documents yielded an accuracy of 97.5%, an F-measure of 97.9% and a low FPR of 3.1%. The proposed scheme can be deployed at various access points to fortify the defense against targeted email attacks that threaten various targets.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2021.3082000</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0002-9616-3212</orcidid><orcidid>https://orcid.org/0000-0002-1728-5300</orcidid><orcidid>https://orcid.org/0000-0002-7822-3672</orcidid><orcidid>https://orcid.org/0000-0002-5819-3082</orcidid><orcidid>https://orcid.org/0000-0002-6477-7770</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2021, Vol.9, p.87962-87971
issn 2169-3536
2169-3536
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_dd70700bf3a14f6ca626cd8f2b617564
source IEEE Xplore Open Access Journals
subjects Cybersecurity
decoy document
Electronic mail
Feature extraction
Machine learning
Metadata
Natural language processing
Phishing
Software
Targeted email attack
Task analysis
title Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T17%3A00%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Leveraging%20Machine%20Learning%20Techniques%20to%20Identify%20Deceptive%20Decoy%20Documents%20Associated%20With%20Targeted%20Email%20Attacks&rft.jtitle=IEEE%20access&rft.au=Sun,%20Bo&rft.date=2021&rft.volume=9&rft.spage=87962&rft.epage=87971&rft.pages=87962-87971&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2021.3082000&rft_dat=%3Cproquest_doaj_%3E2544298128%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c358t-747c15e238317111c677ee601accc4d5401a2e11496420cd8626bd181ae67fb93%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2544298128&rft_id=info:pmid/&rft_ieee_id=9435284&rfr_iscdi=true