Loading…
Seeking Nonsense, Looking for Trouble: Efficient Promotional-Infection Detection through Semantic Inconsistency Search
Promotional infection is an attack in which the adversary exploits a website's weakness to inject illicit advertising content. Detection of such an infection is challenging due to its similarity to legitimate advertising activities. An interesting observation we make in our research is that suc...
Saved in:
Main Authors: | , , , , , , , , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 723 |
container_issue | |
container_start_page | 707 |
container_title | |
container_volume | |
creator | Xiaojing Liao Kan Yuan XiaoFeng Wang Zhongyu Pei Hao Yang Jianjun Chen Haixin Duan Kun Du Alowaisheq, Eihal Alrwais, Sumayah Luyi Xing Beyah, Raheem |
description | Promotional infection is an attack in which the adversary exploits a website's weakness to inject illicit advertising content. Detection of such an infection is challenging due to its similarity to legitimate advertising activities. An interesting observation we make in our research is that such an attack almost always incurs a great semantic gap between the infected domain (e.g., a university site) and the content it promotes (e.g., selling cheap viagra). Exploiting this gap, we developed a semantic-based technique, called Semantic Inconsistency Search (SEISE), for efficient and accurate detection of the promotional injections on sponsored top-level domains (sTLD) with explicit semantic meanings. Our approach utilizes Natural Language Processing (NLP) to identify the bad terms (those related to illicit activities like fake drug selling, etc.) most irrelevant to an sTLD's semantics. These terms, which we call irrelevant bad terms (IBTs), are used to query search engines under the sTLD for suspicious domains. Through a semantic analysis on the results page returned by the search engines, SEISE is able to detect those truly infected sites and automatically collect new IBTs from the titles/URLs/snippets of their search result items for finding new infections. Running on 403 sTLDs with an initial 30 seed IBTs, SEISE analyzed 100K fully qualified domain names (FQDN), and along the way automatically gathered nearly 600 IBTs. In the end, our approach detected 11K infected FQDN with a false detection rate of 1.5% and over 90% coverage. Our study shows that by effective detection of infected sTLDs, the bar to promotion infections can be substantially raised, since other non-sTLD vulnerable domains typically have much lower Alexa ranks and are therefore much less attractive for underground advertising. Our findings further bring to light the stunning impacts of such promotional attacks, which compromise FQDNs under 3% of .edu, .gov domains and over one thousand gov.cn domains, including those of leading universities such as stanford.edu, mit.edu, princeton.edu, havard.edu and government institutes such as nsf.gov and nih.gov. We further demonstrate the potential to extend our current technique to protect generic domains such as .com and .org. |
doi_str_mv | 10.1109/SP.2016.48 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>proquest_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_7546531</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7546531</ieee_id><sourcerecordid>1835566662</sourcerecordid><originalsourceid>FETCH-LOGICAL-i208t-63a400786ac5c580d0b79bedffc5e0175360dedf03c7e9643065826d64fe63483</originalsourceid><addsrcrecordid>eNotTLFOwzAUNEhIlMLCyuKRgZRnO3YSNlQKVKqgUsscuc5za0jskrhI_XsC7dNJ7-50d4RcMxgxBsX9Yj7iwNQozU_IBZNQAOQ8ZadkwEUmE8YhOycXXfcJwEEU6YD8LBC_nF_Tt-A77HFHZyH8Oza0dNmG3arGBzqx1hmHPtJ5G5oQXfC6Tqbeovnj9AnjkcVN31lv6AIb7aMzdOpNv-26iN7se1u3ZnNJzqyuO7w6_iH5eJ4sx6_J7P1lOn6cJY5DHhMldAqQ5UobaWQOFayyYoWVtUYisEwKBVUvQZgMC5UKUDLnqlKpRSXSXAzJ7WF324bvHXaxbFxnsK61x7DrSpYLKVV_vI_eHKIOEctt6xrd7stMpkoKJn4BYRlpFQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype><pqid>1835566662</pqid></control><display><type>conference_proceeding</type><title>Seeking Nonsense, Looking for Trouble: Efficient Promotional-Infection Detection through Semantic Inconsistency Search</title><source>IEEE Xplore All Conference Series</source><creator>Xiaojing Liao ; Kan Yuan ; XiaoFeng Wang ; Zhongyu Pei ; Hao Yang ; Jianjun Chen ; Haixin Duan ; Kun Du ; Alowaisheq, Eihal ; Alrwais, Sumayah ; Luyi Xing ; Beyah, Raheem</creator><creatorcontrib>Xiaojing Liao ; Kan Yuan ; XiaoFeng Wang ; Zhongyu Pei ; Hao Yang ; Jianjun Chen ; Haixin Duan ; Kun Du ; Alowaisheq, Eihal ; Alrwais, Sumayah ; Luyi Xing ; Beyah, Raheem</creatorcontrib><description>Promotional infection is an attack in which the adversary exploits a website's weakness to inject illicit advertising content. Detection of such an infection is challenging due to its similarity to legitimate advertising activities. An interesting observation we make in our research is that such an attack almost always incurs a great semantic gap between the infected domain (e.g., a university site) and the content it promotes (e.g., selling cheap viagra). Exploiting this gap, we developed a semantic-based technique, called Semantic Inconsistency Search (SEISE), for efficient and accurate detection of the promotional injections on sponsored top-level domains (sTLD) with explicit semantic meanings. Our approach utilizes Natural Language Processing (NLP) to identify the bad terms (those related to illicit activities like fake drug selling, etc.) most irrelevant to an sTLD's semantics. These terms, which we call irrelevant bad terms (IBTs), are used to query search engines under the sTLD for suspicious domains. Through a semantic analysis on the results page returned by the search engines, SEISE is able to detect those truly infected sites and automatically collect new IBTs from the titles/URLs/snippets of their search result items for finding new infections. Running on 403 sTLDs with an initial 30 seed IBTs, SEISE analyzed 100K fully qualified domain names (FQDN), and along the way automatically gathered nearly 600 IBTs. In the end, our approach detected 11K infected FQDN with a false detection rate of 1.5% and over 90% coverage. Our study shows that by effective detection of infected sTLDs, the bar to promotion infections can be substantially raised, since other non-sTLD vulnerable domains typically have much lower Alexa ranks and are therefore much less attractive for underground advertising. Our findings further bring to light the stunning impacts of such promotional attacks, which compromise FQDNs under 3% of .edu, .gov domains and over one thousand gov.cn domains, including those of leading universities such as stanford.edu, mit.edu, princeton.edu, havard.edu and government institutes such as nsf.gov and nih.gov. We further demonstrate the potential to extend our current technique to protect generic domains such as .com and .org.</description><identifier>EISSN: 2375-1207</identifier><identifier>EISBN: 1509008241</identifier><identifier>EISBN: 9781509008247</identifier><identifier>DOI: 10.1109/SP.2016.48</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Advertising ; Blogs ; Drugs ; Government ; Illicit ; NLP ; Privacy ; promotional infection ; Search engines ; Searching ; Security ; semantic ; Semantics ; Similarity ; Uniform resource locators</subject><ispartof>2016 IEEE Symposium on Security and Privacy (SP), 2016, p.707-723</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7546531$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,314,780,784,789,790,23930,23931,25140,27924,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7546531$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Xiaojing Liao</creatorcontrib><creatorcontrib>Kan Yuan</creatorcontrib><creatorcontrib>XiaoFeng Wang</creatorcontrib><creatorcontrib>Zhongyu Pei</creatorcontrib><creatorcontrib>Hao Yang</creatorcontrib><creatorcontrib>Jianjun Chen</creatorcontrib><creatorcontrib>Haixin Duan</creatorcontrib><creatorcontrib>Kun Du</creatorcontrib><creatorcontrib>Alowaisheq, Eihal</creatorcontrib><creatorcontrib>Alrwais, Sumayah</creatorcontrib><creatorcontrib>Luyi Xing</creatorcontrib><creatorcontrib>Beyah, Raheem</creatorcontrib><title>Seeking Nonsense, Looking for Trouble: Efficient Promotional-Infection Detection through Semantic Inconsistency Search</title><title>2016 IEEE Symposium on Security and Privacy (SP)</title><addtitle>SP</addtitle><description>Promotional infection is an attack in which the adversary exploits a website's weakness to inject illicit advertising content. Detection of such an infection is challenging due to its similarity to legitimate advertising activities. An interesting observation we make in our research is that such an attack almost always incurs a great semantic gap between the infected domain (e.g., a university site) and the content it promotes (e.g., selling cheap viagra). Exploiting this gap, we developed a semantic-based technique, called Semantic Inconsistency Search (SEISE), for efficient and accurate detection of the promotional injections on sponsored top-level domains (sTLD) with explicit semantic meanings. Our approach utilizes Natural Language Processing (NLP) to identify the bad terms (those related to illicit activities like fake drug selling, etc.) most irrelevant to an sTLD's semantics. These terms, which we call irrelevant bad terms (IBTs), are used to query search engines under the sTLD for suspicious domains. Through a semantic analysis on the results page returned by the search engines, SEISE is able to detect those truly infected sites and automatically collect new IBTs from the titles/URLs/snippets of their search result items for finding new infections. Running on 403 sTLDs with an initial 30 seed IBTs, SEISE analyzed 100K fully qualified domain names (FQDN), and along the way automatically gathered nearly 600 IBTs. In the end, our approach detected 11K infected FQDN with a false detection rate of 1.5% and over 90% coverage. Our study shows that by effective detection of infected sTLDs, the bar to promotion infections can be substantially raised, since other non-sTLD vulnerable domains typically have much lower Alexa ranks and are therefore much less attractive for underground advertising. Our findings further bring to light the stunning impacts of such promotional attacks, which compromise FQDNs under 3% of .edu, .gov domains and over one thousand gov.cn domains, including those of leading universities such as stanford.edu, mit.edu, princeton.edu, havard.edu and government institutes such as nsf.gov and nih.gov. We further demonstrate the potential to extend our current technique to protect generic domains such as .com and .org.</description><subject>Advertising</subject><subject>Blogs</subject><subject>Drugs</subject><subject>Government</subject><subject>Illicit</subject><subject>NLP</subject><subject>Privacy</subject><subject>promotional infection</subject><subject>Search engines</subject><subject>Searching</subject><subject>Security</subject><subject>semantic</subject><subject>Semantics</subject><subject>Similarity</subject><subject>Uniform resource locators</subject><issn>2375-1207</issn><isbn>1509008241</isbn><isbn>9781509008247</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2016</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotTLFOwzAUNEhIlMLCyuKRgZRnO3YSNlQKVKqgUsscuc5za0jskrhI_XsC7dNJ7-50d4RcMxgxBsX9Yj7iwNQozU_IBZNQAOQ8ZadkwEUmE8YhOycXXfcJwEEU6YD8LBC_nF_Tt-A77HFHZyH8Oza0dNmG3arGBzqx1hmHPtJ5G5oQXfC6Tqbeovnj9AnjkcVN31lv6AIb7aMzdOpNv-26iN7se1u3ZnNJzqyuO7w6_iH5eJ4sx6_J7P1lOn6cJY5DHhMldAqQ5UobaWQOFayyYoWVtUYisEwKBVUvQZgMC5UKUDLnqlKpRSXSXAzJ7WF324bvHXaxbFxnsK61x7DrSpYLKVV_vI_eHKIOEctt6xrd7stMpkoKJn4BYRlpFQ</recordid><startdate>20160501</startdate><enddate>20160501</enddate><creator>Xiaojing Liao</creator><creator>Kan Yuan</creator><creator>XiaoFeng Wang</creator><creator>Zhongyu Pei</creator><creator>Hao Yang</creator><creator>Jianjun Chen</creator><creator>Haixin Duan</creator><creator>Kun Du</creator><creator>Alowaisheq, Eihal</creator><creator>Alrwais, Sumayah</creator><creator>Luyi Xing</creator><creator>Beyah, Raheem</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20160501</creationdate><title>Seeking Nonsense, Looking for Trouble: Efficient Promotional-Infection Detection through Semantic Inconsistency Search</title><author>Xiaojing Liao ; Kan Yuan ; XiaoFeng Wang ; Zhongyu Pei ; Hao Yang ; Jianjun Chen ; Haixin Duan ; Kun Du ; Alowaisheq, Eihal ; Alrwais, Sumayah ; Luyi Xing ; Beyah, Raheem</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i208t-63a400786ac5c580d0b79bedffc5e0175360dedf03c7e9643065826d64fe63483</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Advertising</topic><topic>Blogs</topic><topic>Drugs</topic><topic>Government</topic><topic>Illicit</topic><topic>NLP</topic><topic>Privacy</topic><topic>promotional infection</topic><topic>Search engines</topic><topic>Searching</topic><topic>Security</topic><topic>semantic</topic><topic>Semantics</topic><topic>Similarity</topic><topic>Uniform resource locators</topic><toplevel>online_resources</toplevel><creatorcontrib>Xiaojing Liao</creatorcontrib><creatorcontrib>Kan Yuan</creatorcontrib><creatorcontrib>XiaoFeng Wang</creatorcontrib><creatorcontrib>Zhongyu Pei</creatorcontrib><creatorcontrib>Hao Yang</creatorcontrib><creatorcontrib>Jianjun Chen</creatorcontrib><creatorcontrib>Haixin Duan</creatorcontrib><creatorcontrib>Kun Du</creatorcontrib><creatorcontrib>Alowaisheq, Eihal</creatorcontrib><creatorcontrib>Alrwais, Sumayah</creatorcontrib><creatorcontrib>Luyi Xing</creatorcontrib><creatorcontrib>Beyah, Raheem</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library Online</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xiaojing Liao</au><au>Kan Yuan</au><au>XiaoFeng Wang</au><au>Zhongyu Pei</au><au>Hao Yang</au><au>Jianjun Chen</au><au>Haixin Duan</au><au>Kun Du</au><au>Alowaisheq, Eihal</au><au>Alrwais, Sumayah</au><au>Luyi Xing</au><au>Beyah, Raheem</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Seeking Nonsense, Looking for Trouble: Efficient Promotional-Infection Detection through Semantic Inconsistency Search</atitle><btitle>2016 IEEE Symposium on Security and Privacy (SP)</btitle><stitle>SP</stitle><date>2016-05-01</date><risdate>2016</risdate><spage>707</spage><epage>723</epage><pages>707-723</pages><eissn>2375-1207</eissn><eisbn>1509008241</eisbn><eisbn>9781509008247</eisbn><coden>IEEPAD</coden><abstract>Promotional infection is an attack in which the adversary exploits a website's weakness to inject illicit advertising content. Detection of such an infection is challenging due to its similarity to legitimate advertising activities. An interesting observation we make in our research is that such an attack almost always incurs a great semantic gap between the infected domain (e.g., a university site) and the content it promotes (e.g., selling cheap viagra). Exploiting this gap, we developed a semantic-based technique, called Semantic Inconsistency Search (SEISE), for efficient and accurate detection of the promotional injections on sponsored top-level domains (sTLD) with explicit semantic meanings. Our approach utilizes Natural Language Processing (NLP) to identify the bad terms (those related to illicit activities like fake drug selling, etc.) most irrelevant to an sTLD's semantics. These terms, which we call irrelevant bad terms (IBTs), are used to query search engines under the sTLD for suspicious domains. Through a semantic analysis on the results page returned by the search engines, SEISE is able to detect those truly infected sites and automatically collect new IBTs from the titles/URLs/snippets of their search result items for finding new infections. Running on 403 sTLDs with an initial 30 seed IBTs, SEISE analyzed 100K fully qualified domain names (FQDN), and along the way automatically gathered nearly 600 IBTs. In the end, our approach detected 11K infected FQDN with a false detection rate of 1.5% and over 90% coverage. Our study shows that by effective detection of infected sTLDs, the bar to promotion infections can be substantially raised, since other non-sTLD vulnerable domains typically have much lower Alexa ranks and are therefore much less attractive for underground advertising. Our findings further bring to light the stunning impacts of such promotional attacks, which compromise FQDNs under 3% of .edu, .gov domains and over one thousand gov.cn domains, including those of leading universities such as stanford.edu, mit.edu, princeton.edu, havard.edu and government institutes such as nsf.gov and nih.gov. We further demonstrate the potential to extend our current technique to protect generic domains such as .com and .org.</abstract><pub>IEEE</pub><doi>10.1109/SP.2016.48</doi><tpages>17</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | EISSN: 2375-1207 |
ispartof | 2016 IEEE Symposium on Security and Privacy (SP), 2016, p.707-723 |
issn | 2375-1207 |
language | eng |
recordid | cdi_ieee_primary_7546531 |
source | IEEE Xplore All Conference Series |
subjects | Advertising Blogs Drugs Government Illicit NLP Privacy promotional infection Search engines Searching Security semantic Semantics Similarity Uniform resource locators |
title | Seeking Nonsense, Looking for Trouble: Efficient Promotional-Infection Detection through Semantic Inconsistency Search |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T13%3A16%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Seeking%20Nonsense,%20Looking%20for%20Trouble:%20Efficient%20Promotional-Infection%20Detection%20through%20Semantic%20Inconsistency%20Search&rft.btitle=2016%20IEEE%20Symposium%20on%20Security%20and%20Privacy%20(SP)&rft.au=Xiaojing%20Liao&rft.date=2016-05-01&rft.spage=707&rft.epage=723&rft.pages=707-723&rft.eissn=2375-1207&rft.coden=IEEPAD&rft_id=info:doi/10.1109/SP.2016.48&rft.eisbn=1509008241&rft.eisbn_list=9781509008247&rft_dat=%3Cproquest_CHZPO%3E1835566662%3C/proquest_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i208t-63a400786ac5c580d0b79bedffc5e0175360dedf03c7e9643065826d64fe63483%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1835566662&rft_id=info:pmid/&rft_ieee_id=7546531&rfr_iscdi=true |