Loading…
Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks
Detecting and preventing targeted email attacks is a long-standing challenge in cybersecurity research and practice. A typical targeted email attack capitalizes on a sophisticated email message to persuade a victim to run a specific, seemingly innocuous, action such as opening a link or an attachmen...
Saved in:
Published in: | IEEE access 2021, Vol.9, p.87962-87971 |
---|---|
Main Authors: | , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c358t-747c15e238317111c677ee601accc4d5401a2e11496420cd8626bd181ae67fb93 |
container_end_page | 87971 |
container_issue | |
container_start_page | 87962 |
container_title | IEEE access |
container_volume | 9 |
creator | Sun, Bo Ban, Tao Han, Chansu Takahashi, Takeshi Yoshioka, Katsunari Takeuchi, Jun'ichi Sarrafzadeh, Abdolhossein Qiu, Meikang Inoue, Daisuke |
description | Detecting and preventing targeted email attacks is a long-standing challenge in cybersecurity research and practice. A typical targeted email attack capitalizes on a sophisticated email message to persuade a victim to run a specific, seemingly innocuous, action such as opening a link or an attachment and downloading and installing a software program. To successfully perform such an attack without being noticed afterwards, the attached exploit documents (hereafter referred to as decoy documents ), must contain content that is highly relevant to the target. An analysis of such decoy documents can provide crucial information for inferring and identifying the targeted or potentially harmed victims. In this paper, we propose an automatic approach that leverages natural language processing and machine learning to identify decoy documents that have a high chance of deceiving the targeted users. The experimental results show that the proposed method provides good prediction accuracy: the best result obtained on a collection of 200 Chinese decoy documents yielded an accuracy of 97.5%, an F-measure of 97.9% and a low FPR of 3.1%. The proposed scheme can be deployed at various access points to fortify the defense against targeted email attacks that threaten various targets. |
doi_str_mv | 10.1109/ACCESS.2021.3082000 |
format | article |
fullrecord | <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_dd70700bf3a14f6ca626cd8f2b617564</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9435284</ieee_id><doaj_id>oai_doaj_org_article_dd70700bf3a14f6ca626cd8f2b617564</doaj_id><sourcerecordid>2544298128</sourcerecordid><originalsourceid>FETCH-LOGICAL-c358t-747c15e238317111c677ee601accc4d5401a2e11496420cd8626bd181ae67fb93</originalsourceid><addsrcrecordid>eNpNUcFuGyEUXFWp1CjJF-SClLMdHrDAHi3HSS256iGuekSYfWvj2IsDOFL-vmw3inJi3jAzDzFVdQt0CkCb-9l8vnh-njLKYMqpZpTSb9UlA9lMeM3lxRf8o7pJaV8EVBeqVpdVWuEbRrv1_Zb8sm7neyQrtLEfiDW6Xe9fz5hIDmTZYp99904e0OEp-zccUChzcOdjuUtkllJw3mZsyV-fd2Rt4xaHaXG0_kBmOVv3kq6r7509JLz5OK-qP4-L9fznZPX7aTmfrSaO1zpPlFAOamRcc1AA4KRSiJKCdc6JthYFMQQQjRSMulZLJjctaLAoVbdp-FW1HHPbYPfmFP3RxncTrDf_iRC3xsbs3QFN2yqqKN103ILopLMlqyR2bCNB1VKUrLsx6xTD8CHZ7MM59uX5htVCsEYD00XFR5WLIaWI3edWoGYoy4xlmaEs81FWcd2OLo-In45G8Jppwf8BWsGPqQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2544298128</pqid></control><display><type>article</type><title>Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks</title><source>IEEE Xplore Open Access Journals</source><creator>Sun, Bo ; Ban, Tao ; Han, Chansu ; Takahashi, Takeshi ; Yoshioka, Katsunari ; Takeuchi, Jun'ichi ; Sarrafzadeh, Abdolhossein ; Qiu, Meikang ; Inoue, Daisuke</creator><creatorcontrib>Sun, Bo ; Ban, Tao ; Han, Chansu ; Takahashi, Takeshi ; Yoshioka, Katsunari ; Takeuchi, Jun'ichi ; Sarrafzadeh, Abdolhossein ; Qiu, Meikang ; Inoue, Daisuke</creatorcontrib><description>Detecting and preventing targeted email attacks is a long-standing challenge in cybersecurity research and practice. A typical targeted email attack capitalizes on a sophisticated email message to persuade a victim to run a specific, seemingly innocuous, action such as opening a link or an attachment and downloading and installing a software program. To successfully perform such an attack without being noticed afterwards, the attached exploit documents (hereafter referred to as decoy documents ), must contain content that is highly relevant to the target. An analysis of such decoy documents can provide crucial information for inferring and identifying the targeted or potentially harmed victims. In this paper, we propose an automatic approach that leverages natural language processing and machine learning to identify decoy documents that have a high chance of deceiving the targeted users. The experimental results show that the proposed method provides good prediction accuracy: the best result obtained on a collection of 200 Chinese decoy documents yielded an accuracy of 97.5%, an F-measure of 97.9% and a low FPR of 3.1%. The proposed scheme can be deployed at various access points to fortify the defense against targeted email attacks that threaten various targets.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2021.3082000</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Cybersecurity ; decoy document ; Electronic mail ; Feature extraction ; Machine learning ; Metadata ; Natural language processing ; Phishing ; Software ; Targeted email attack ; Task analysis</subject><ispartof>IEEE access, 2021, Vol.9, p.87962-87971</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c358t-747c15e238317111c677ee601accc4d5401a2e11496420cd8626bd181ae67fb93</cites><orcidid>0000-0002-9616-3212 ; 0000-0002-1728-5300 ; 0000-0002-7822-3672 ; 0000-0002-5819-3082 ; 0000-0002-6477-7770</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9435284$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Sun, Bo</creatorcontrib><creatorcontrib>Ban, Tao</creatorcontrib><creatorcontrib>Han, Chansu</creatorcontrib><creatorcontrib>Takahashi, Takeshi</creatorcontrib><creatorcontrib>Yoshioka, Katsunari</creatorcontrib><creatorcontrib>Takeuchi, Jun'ichi</creatorcontrib><creatorcontrib>Sarrafzadeh, Abdolhossein</creatorcontrib><creatorcontrib>Qiu, Meikang</creatorcontrib><creatorcontrib>Inoue, Daisuke</creatorcontrib><title>Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks</title><title>IEEE access</title><addtitle>Access</addtitle><description>Detecting and preventing targeted email attacks is a long-standing challenge in cybersecurity research and practice. A typical targeted email attack capitalizes on a sophisticated email message to persuade a victim to run a specific, seemingly innocuous, action such as opening a link or an attachment and downloading and installing a software program. To successfully perform such an attack without being noticed afterwards, the attached exploit documents (hereafter referred to as decoy documents ), must contain content that is highly relevant to the target. An analysis of such decoy documents can provide crucial information for inferring and identifying the targeted or potentially harmed victims. In this paper, we propose an automatic approach that leverages natural language processing and machine learning to identify decoy documents that have a high chance of deceiving the targeted users. The experimental results show that the proposed method provides good prediction accuracy: the best result obtained on a collection of 200 Chinese decoy documents yielded an accuracy of 97.5%, an F-measure of 97.9% and a low FPR of 3.1%. The proposed scheme can be deployed at various access points to fortify the defense against targeted email attacks that threaten various targets.</description><subject>Cybersecurity</subject><subject>decoy document</subject><subject>Electronic mail</subject><subject>Feature extraction</subject><subject>Machine learning</subject><subject>Metadata</subject><subject>Natural language processing</subject><subject>Phishing</subject><subject>Software</subject><subject>Targeted email attack</subject><subject>Task analysis</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>DOA</sourceid><recordid>eNpNUcFuGyEUXFWp1CjJF-SClLMdHrDAHi3HSS256iGuekSYfWvj2IsDOFL-vmw3inJi3jAzDzFVdQt0CkCb-9l8vnh-njLKYMqpZpTSb9UlA9lMeM3lxRf8o7pJaV8EVBeqVpdVWuEbRrv1_Zb8sm7neyQrtLEfiDW6Xe9fz5hIDmTZYp99904e0OEp-zccUChzcOdjuUtkllJw3mZsyV-fd2Rt4xaHaXG0_kBmOVv3kq6r7509JLz5OK-qP4-L9fznZPX7aTmfrSaO1zpPlFAOamRcc1AA4KRSiJKCdc6JthYFMQQQjRSMulZLJjctaLAoVbdp-FW1HHPbYPfmFP3RxncTrDf_iRC3xsbs3QFN2yqqKN103ILopLMlqyR2bCNB1VKUrLsx6xTD8CHZ7MM59uX5htVCsEYD00XFR5WLIaWI3edWoGYoy4xlmaEs81FWcd2OLo-In45G8Jppwf8BWsGPqQ</recordid><startdate>2021</startdate><enddate>2021</enddate><creator>Sun, Bo</creator><creator>Ban, Tao</creator><creator>Han, Chansu</creator><creator>Takahashi, Takeshi</creator><creator>Yoshioka, Katsunari</creator><creator>Takeuchi, Jun'ichi</creator><creator>Sarrafzadeh, Abdolhossein</creator><creator>Qiu, Meikang</creator><creator>Inoue, Daisuke</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-9616-3212</orcidid><orcidid>https://orcid.org/0000-0002-1728-5300</orcidid><orcidid>https://orcid.org/0000-0002-7822-3672</orcidid><orcidid>https://orcid.org/0000-0002-5819-3082</orcidid><orcidid>https://orcid.org/0000-0002-6477-7770</orcidid></search><sort><creationdate>2021</creationdate><title>Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks</title><author>Sun, Bo ; Ban, Tao ; Han, Chansu ; Takahashi, Takeshi ; Yoshioka, Katsunari ; Takeuchi, Jun'ichi ; Sarrafzadeh, Abdolhossein ; Qiu, Meikang ; Inoue, Daisuke</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c358t-747c15e238317111c677ee601accc4d5401a2e11496420cd8626bd181ae67fb93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Cybersecurity</topic><topic>decoy document</topic><topic>Electronic mail</topic><topic>Feature extraction</topic><topic>Machine learning</topic><topic>Metadata</topic><topic>Natural language processing</topic><topic>Phishing</topic><topic>Software</topic><topic>Targeted email attack</topic><topic>Task analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sun, Bo</creatorcontrib><creatorcontrib>Ban, Tao</creatorcontrib><creatorcontrib>Han, Chansu</creatorcontrib><creatorcontrib>Takahashi, Takeshi</creatorcontrib><creatorcontrib>Yoshioka, Katsunari</creatorcontrib><creatorcontrib>Takeuchi, Jun'ichi</creatorcontrib><creatorcontrib>Sarrafzadeh, Abdolhossein</creatorcontrib><creatorcontrib>Qiu, Meikang</creatorcontrib><creatorcontrib>Inoue, Daisuke</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sun, Bo</au><au>Ban, Tao</au><au>Han, Chansu</au><au>Takahashi, Takeshi</au><au>Yoshioka, Katsunari</au><au>Takeuchi, Jun'ichi</au><au>Sarrafzadeh, Abdolhossein</au><au>Qiu, Meikang</au><au>Inoue, Daisuke</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2021</date><risdate>2021</risdate><volume>9</volume><spage>87962</spage><epage>87971</epage><pages>87962-87971</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Detecting and preventing targeted email attacks is a long-standing challenge in cybersecurity research and practice. A typical targeted email attack capitalizes on a sophisticated email message to persuade a victim to run a specific, seemingly innocuous, action such as opening a link or an attachment and downloading and installing a software program. To successfully perform such an attack without being noticed afterwards, the attached exploit documents (hereafter referred to as decoy documents ), must contain content that is highly relevant to the target. An analysis of such decoy documents can provide crucial information for inferring and identifying the targeted or potentially harmed victims. In this paper, we propose an automatic approach that leverages natural language processing and machine learning to identify decoy documents that have a high chance of deceiving the targeted users. The experimental results show that the proposed method provides good prediction accuracy: the best result obtained on a collection of 200 Chinese decoy documents yielded an accuracy of 97.5%, an F-measure of 97.9% and a low FPR of 3.1%. The proposed scheme can be deployed at various access points to fortify the defense against targeted email attacks that threaten various targets.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2021.3082000</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0002-9616-3212</orcidid><orcidid>https://orcid.org/0000-0002-1728-5300</orcidid><orcidid>https://orcid.org/0000-0002-7822-3672</orcidid><orcidid>https://orcid.org/0000-0002-5819-3082</orcidid><orcidid>https://orcid.org/0000-0002-6477-7770</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2021, Vol.9, p.87962-87971 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_doaj_primary_oai_doaj_org_article_dd70700bf3a14f6ca626cd8f2b617564 |
source | IEEE Xplore Open Access Journals |
subjects | Cybersecurity decoy document Electronic mail Feature extraction Machine learning Metadata Natural language processing Phishing Software Targeted email attack Task analysis |
title | Leveraging Machine Learning Techniques to Identify Deceptive Decoy Documents Associated With Targeted Email Attacks |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T17%3A00%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Leveraging%20Machine%20Learning%20Techniques%20to%20Identify%20Deceptive%20Decoy%20Documents%20Associated%20With%20Targeted%20Email%20Attacks&rft.jtitle=IEEE%20access&rft.au=Sun,%20Bo&rft.date=2021&rft.volume=9&rft.spage=87962&rft.epage=87971&rft.pages=87962-87971&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2021.3082000&rft_dat=%3Cproquest_doaj_%3E2544298128%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c358t-747c15e238317111c677ee601accc4d5401a2e11496420cd8626bd181ae67fb93%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2544298128&rft_id=info:pmid/&rft_ieee_id=9435284&rfr_iscdi=true |