Loading…

Seeking Nonsense, Looking for Trouble: Efficient Promotional-Infection Detection through Semantic Inconsistency Search

Promotional infection is an attack in which the adversary exploits a website's weakness to inject illicit advertising content. Detection of such an infection is challenging due to its similarity to legitimate advertising activities. An interesting observation we make in our research is that suc...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaojing Liao, Kan Yuan, XiaoFeng Wang, Zhongyu Pei, Hao Yang, Jianjun Chen, Haixin Duan, Kun Du, Alowaisheq, Eihal, Alrwais, Sumayah, Luyi Xing, Beyah, Raheem
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 723
container_issue
container_start_page 707
container_title
container_volume
creator Xiaojing Liao
Kan Yuan
XiaoFeng Wang
Zhongyu Pei
Hao Yang
Jianjun Chen
Haixin Duan
Kun Du
Alowaisheq, Eihal
Alrwais, Sumayah
Luyi Xing
Beyah, Raheem
description Promotional infection is an attack in which the adversary exploits a website's weakness to inject illicit advertising content. Detection of such an infection is challenging due to its similarity to legitimate advertising activities. An interesting observation we make in our research is that such an attack almost always incurs a great semantic gap between the infected domain (e.g., a university site) and the content it promotes (e.g., selling cheap viagra). Exploiting this gap, we developed a semantic-based technique, called Semantic Inconsistency Search (SEISE), for efficient and accurate detection of the promotional injections on sponsored top-level domains (sTLD) with explicit semantic meanings. Our approach utilizes Natural Language Processing (NLP) to identify the bad terms (those related to illicit activities like fake drug selling, etc.) most irrelevant to an sTLD's semantics. These terms, which we call irrelevant bad terms (IBTs), are used to query search engines under the sTLD for suspicious domains. Through a semantic analysis on the results page returned by the search engines, SEISE is able to detect those truly infected sites and automatically collect new IBTs from the titles/URLs/snippets of their search result items for finding new infections. Running on 403 sTLDs with an initial 30 seed IBTs, SEISE analyzed 100K fully qualified domain names (FQDN), and along the way automatically gathered nearly 600 IBTs. In the end, our approach detected 11K infected FQDN with a false detection rate of 1.5% and over 90% coverage. Our study shows that by effective detection of infected sTLDs, the bar to promotion infections can be substantially raised, since other non-sTLD vulnerable domains typically have much lower Alexa ranks and are therefore much less attractive for underground advertising. Our findings further bring to light the stunning impacts of such promotional attacks, which compromise FQDNs under 3% of .edu, .gov domains and over one thousand gov.cn domains, including those of leading universities such as stanford.edu, mit.edu, princeton.edu, havard.edu and government institutes such as nsf.gov and nih.gov. We further demonstrate the potential to extend our current technique to protect generic domains such as .com and .org.
doi_str_mv 10.1109/SP.2016.48
format conference_proceeding
fullrecord <record><control><sourceid>proquest_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_7546531</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7546531</ieee_id><sourcerecordid>1835566662</sourcerecordid><originalsourceid>FETCH-LOGICAL-i208t-63a400786ac5c580d0b79bedffc5e0175360dedf03c7e9643065826d64fe63483</originalsourceid><addsrcrecordid>eNotTLFOwzAUNEhIlMLCyuKRgZRnO3YSNlQKVKqgUsscuc5za0jskrhI_XsC7dNJ7-50d4RcMxgxBsX9Yj7iwNQozU_IBZNQAOQ8ZadkwEUmE8YhOycXXfcJwEEU6YD8LBC_nF_Tt-A77HFHZyH8Oza0dNmG3arGBzqx1hmHPtJ5G5oQXfC6Tqbeovnj9AnjkcVN31lv6AIb7aMzdOpNv-26iN7se1u3ZnNJzqyuO7w6_iH5eJ4sx6_J7P1lOn6cJY5DHhMldAqQ5UobaWQOFayyYoWVtUYisEwKBVUvQZgMC5UKUDLnqlKpRSXSXAzJ7WF324bvHXaxbFxnsK61x7DrSpYLKVV_vI_eHKIOEctt6xrd7stMpkoKJn4BYRlpFQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype><pqid>1835566662</pqid></control><display><type>conference_proceeding</type><title>Seeking Nonsense, Looking for Trouble: Efficient Promotional-Infection Detection through Semantic Inconsistency Search</title><source>IEEE Xplore All Conference Series</source><creator>Xiaojing Liao ; Kan Yuan ; XiaoFeng Wang ; Zhongyu Pei ; Hao Yang ; Jianjun Chen ; Haixin Duan ; Kun Du ; Alowaisheq, Eihal ; Alrwais, Sumayah ; Luyi Xing ; Beyah, Raheem</creator><creatorcontrib>Xiaojing Liao ; Kan Yuan ; XiaoFeng Wang ; Zhongyu Pei ; Hao Yang ; Jianjun Chen ; Haixin Duan ; Kun Du ; Alowaisheq, Eihal ; Alrwais, Sumayah ; Luyi Xing ; Beyah, Raheem</creatorcontrib><description>Promotional infection is an attack in which the adversary exploits a website's weakness to inject illicit advertising content. Detection of such an infection is challenging due to its similarity to legitimate advertising activities. An interesting observation we make in our research is that such an attack almost always incurs a great semantic gap between the infected domain (e.g., a university site) and the content it promotes (e.g., selling cheap viagra). Exploiting this gap, we developed a semantic-based technique, called Semantic Inconsistency Search (SEISE), for efficient and accurate detection of the promotional injections on sponsored top-level domains (sTLD) with explicit semantic meanings. Our approach utilizes Natural Language Processing (NLP) to identify the bad terms (those related to illicit activities like fake drug selling, etc.) most irrelevant to an sTLD's semantics. These terms, which we call irrelevant bad terms (IBTs), are used to query search engines under the sTLD for suspicious domains. Through a semantic analysis on the results page returned by the search engines, SEISE is able to detect those truly infected sites and automatically collect new IBTs from the titles/URLs/snippets of their search result items for finding new infections. Running on 403 sTLDs with an initial 30 seed IBTs, SEISE analyzed 100K fully qualified domain names (FQDN), and along the way automatically gathered nearly 600 IBTs. In the end, our approach detected 11K infected FQDN with a false detection rate of 1.5% and over 90% coverage. Our study shows that by effective detection of infected sTLDs, the bar to promotion infections can be substantially raised, since other non-sTLD vulnerable domains typically have much lower Alexa ranks and are therefore much less attractive for underground advertising. Our findings further bring to light the stunning impacts of such promotional attacks, which compromise FQDNs under 3% of .edu, .gov domains and over one thousand gov.cn domains, including those of leading universities such as stanford.edu, mit.edu, princeton.edu, havard.edu and government institutes such as nsf.gov and nih.gov. We further demonstrate the potential to extend our current technique to protect generic domains such as .com and .org.</description><identifier>EISSN: 2375-1207</identifier><identifier>EISBN: 1509008241</identifier><identifier>EISBN: 9781509008247</identifier><identifier>DOI: 10.1109/SP.2016.48</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Advertising ; Blogs ; Drugs ; Government ; Illicit ; NLP ; Privacy ; promotional infection ; Search engines ; Searching ; Security ; semantic ; Semantics ; Similarity ; Uniform resource locators</subject><ispartof>2016 IEEE Symposium on Security and Privacy (SP), 2016, p.707-723</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7546531$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,314,780,784,789,790,23930,23931,25140,27924,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7546531$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Xiaojing Liao</creatorcontrib><creatorcontrib>Kan Yuan</creatorcontrib><creatorcontrib>XiaoFeng Wang</creatorcontrib><creatorcontrib>Zhongyu Pei</creatorcontrib><creatorcontrib>Hao Yang</creatorcontrib><creatorcontrib>Jianjun Chen</creatorcontrib><creatorcontrib>Haixin Duan</creatorcontrib><creatorcontrib>Kun Du</creatorcontrib><creatorcontrib>Alowaisheq, Eihal</creatorcontrib><creatorcontrib>Alrwais, Sumayah</creatorcontrib><creatorcontrib>Luyi Xing</creatorcontrib><creatorcontrib>Beyah, Raheem</creatorcontrib><title>Seeking Nonsense, Looking for Trouble: Efficient Promotional-Infection Detection through Semantic Inconsistency Search</title><title>2016 IEEE Symposium on Security and Privacy (SP)</title><addtitle>SP</addtitle><description>Promotional infection is an attack in which the adversary exploits a website's weakness to inject illicit advertising content. Detection of such an infection is challenging due to its similarity to legitimate advertising activities. An interesting observation we make in our research is that such an attack almost always incurs a great semantic gap between the infected domain (e.g., a university site) and the content it promotes (e.g., selling cheap viagra). Exploiting this gap, we developed a semantic-based technique, called Semantic Inconsistency Search (SEISE), for efficient and accurate detection of the promotional injections on sponsored top-level domains (sTLD) with explicit semantic meanings. Our approach utilizes Natural Language Processing (NLP) to identify the bad terms (those related to illicit activities like fake drug selling, etc.) most irrelevant to an sTLD's semantics. These terms, which we call irrelevant bad terms (IBTs), are used to query search engines under the sTLD for suspicious domains. Through a semantic analysis on the results page returned by the search engines, SEISE is able to detect those truly infected sites and automatically collect new IBTs from the titles/URLs/snippets of their search result items for finding new infections. Running on 403 sTLDs with an initial 30 seed IBTs, SEISE analyzed 100K fully qualified domain names (FQDN), and along the way automatically gathered nearly 600 IBTs. In the end, our approach detected 11K infected FQDN with a false detection rate of 1.5% and over 90% coverage. Our study shows that by effective detection of infected sTLDs, the bar to promotion infections can be substantially raised, since other non-sTLD vulnerable domains typically have much lower Alexa ranks and are therefore much less attractive for underground advertising. Our findings further bring to light the stunning impacts of such promotional attacks, which compromise FQDNs under 3% of .edu, .gov domains and over one thousand gov.cn domains, including those of leading universities such as stanford.edu, mit.edu, princeton.edu, havard.edu and government institutes such as nsf.gov and nih.gov. We further demonstrate the potential to extend our current technique to protect generic domains such as .com and .org.</description><subject>Advertising</subject><subject>Blogs</subject><subject>Drugs</subject><subject>Government</subject><subject>Illicit</subject><subject>NLP</subject><subject>Privacy</subject><subject>promotional infection</subject><subject>Search engines</subject><subject>Searching</subject><subject>Security</subject><subject>semantic</subject><subject>Semantics</subject><subject>Similarity</subject><subject>Uniform resource locators</subject><issn>2375-1207</issn><isbn>1509008241</isbn><isbn>9781509008247</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2016</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotTLFOwzAUNEhIlMLCyuKRgZRnO3YSNlQKVKqgUsscuc5za0jskrhI_XsC7dNJ7-50d4RcMxgxBsX9Yj7iwNQozU_IBZNQAOQ8ZadkwEUmE8YhOycXXfcJwEEU6YD8LBC_nF_Tt-A77HFHZyH8Oza0dNmG3arGBzqx1hmHPtJ5G5oQXfC6Tqbeovnj9AnjkcVN31lv6AIb7aMzdOpNv-26iN7se1u3ZnNJzqyuO7w6_iH5eJ4sx6_J7P1lOn6cJY5DHhMldAqQ5UobaWQOFayyYoWVtUYisEwKBVUvQZgMC5UKUDLnqlKpRSXSXAzJ7WF324bvHXaxbFxnsK61x7DrSpYLKVV_vI_eHKIOEctt6xrd7stMpkoKJn4BYRlpFQ</recordid><startdate>20160501</startdate><enddate>20160501</enddate><creator>Xiaojing Liao</creator><creator>Kan Yuan</creator><creator>XiaoFeng Wang</creator><creator>Zhongyu Pei</creator><creator>Hao Yang</creator><creator>Jianjun Chen</creator><creator>Haixin Duan</creator><creator>Kun Du</creator><creator>Alowaisheq, Eihal</creator><creator>Alrwais, Sumayah</creator><creator>Luyi Xing</creator><creator>Beyah, Raheem</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20160501</creationdate><title>Seeking Nonsense, Looking for Trouble: Efficient Promotional-Infection Detection through Semantic Inconsistency Search</title><author>Xiaojing Liao ; Kan Yuan ; XiaoFeng Wang ; Zhongyu Pei ; Hao Yang ; Jianjun Chen ; Haixin Duan ; Kun Du ; Alowaisheq, Eihal ; Alrwais, Sumayah ; Luyi Xing ; Beyah, Raheem</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i208t-63a400786ac5c580d0b79bedffc5e0175360dedf03c7e9643065826d64fe63483</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Advertising</topic><topic>Blogs</topic><topic>Drugs</topic><topic>Government</topic><topic>Illicit</topic><topic>NLP</topic><topic>Privacy</topic><topic>promotional infection</topic><topic>Search engines</topic><topic>Searching</topic><topic>Security</topic><topic>semantic</topic><topic>Semantics</topic><topic>Similarity</topic><topic>Uniform resource locators</topic><toplevel>online_resources</toplevel><creatorcontrib>Xiaojing Liao</creatorcontrib><creatorcontrib>Kan Yuan</creatorcontrib><creatorcontrib>XiaoFeng Wang</creatorcontrib><creatorcontrib>Zhongyu Pei</creatorcontrib><creatorcontrib>Hao Yang</creatorcontrib><creatorcontrib>Jianjun Chen</creatorcontrib><creatorcontrib>Haixin Duan</creatorcontrib><creatorcontrib>Kun Du</creatorcontrib><creatorcontrib>Alowaisheq, Eihal</creatorcontrib><creatorcontrib>Alrwais, Sumayah</creatorcontrib><creatorcontrib>Luyi Xing</creatorcontrib><creatorcontrib>Beyah, Raheem</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library Online</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xiaojing Liao</au><au>Kan Yuan</au><au>XiaoFeng Wang</au><au>Zhongyu Pei</au><au>Hao Yang</au><au>Jianjun Chen</au><au>Haixin Duan</au><au>Kun Du</au><au>Alowaisheq, Eihal</au><au>Alrwais, Sumayah</au><au>Luyi Xing</au><au>Beyah, Raheem</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Seeking Nonsense, Looking for Trouble: Efficient Promotional-Infection Detection through Semantic Inconsistency Search</atitle><btitle>2016 IEEE Symposium on Security and Privacy (SP)</btitle><stitle>SP</stitle><date>2016-05-01</date><risdate>2016</risdate><spage>707</spage><epage>723</epage><pages>707-723</pages><eissn>2375-1207</eissn><eisbn>1509008241</eisbn><eisbn>9781509008247</eisbn><coden>IEEPAD</coden><abstract>Promotional infection is an attack in which the adversary exploits a website's weakness to inject illicit advertising content. Detection of such an infection is challenging due to its similarity to legitimate advertising activities. An interesting observation we make in our research is that such an attack almost always incurs a great semantic gap between the infected domain (e.g., a university site) and the content it promotes (e.g., selling cheap viagra). Exploiting this gap, we developed a semantic-based technique, called Semantic Inconsistency Search (SEISE), for efficient and accurate detection of the promotional injections on sponsored top-level domains (sTLD) with explicit semantic meanings. Our approach utilizes Natural Language Processing (NLP) to identify the bad terms (those related to illicit activities like fake drug selling, etc.) most irrelevant to an sTLD's semantics. These terms, which we call irrelevant bad terms (IBTs), are used to query search engines under the sTLD for suspicious domains. Through a semantic analysis on the results page returned by the search engines, SEISE is able to detect those truly infected sites and automatically collect new IBTs from the titles/URLs/snippets of their search result items for finding new infections. Running on 403 sTLDs with an initial 30 seed IBTs, SEISE analyzed 100K fully qualified domain names (FQDN), and along the way automatically gathered nearly 600 IBTs. In the end, our approach detected 11K infected FQDN with a false detection rate of 1.5% and over 90% coverage. Our study shows that by effective detection of infected sTLDs, the bar to promotion infections can be substantially raised, since other non-sTLD vulnerable domains typically have much lower Alexa ranks and are therefore much less attractive for underground advertising. Our findings further bring to light the stunning impacts of such promotional attacks, which compromise FQDNs under 3% of .edu, .gov domains and over one thousand gov.cn domains, including those of leading universities such as stanford.edu, mit.edu, princeton.edu, havard.edu and government institutes such as nsf.gov and nih.gov. We further demonstrate the potential to extend our current technique to protect generic domains such as .com and .org.</abstract><pub>IEEE</pub><doi>10.1109/SP.2016.48</doi><tpages>17</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier EISSN: 2375-1207
ispartof 2016 IEEE Symposium on Security and Privacy (SP), 2016, p.707-723
issn 2375-1207
language eng
recordid cdi_ieee_primary_7546531
source IEEE Xplore All Conference Series
subjects Advertising
Blogs
Drugs
Government
Illicit
NLP
Privacy
promotional infection
Search engines
Searching
Security
semantic
Semantics
Similarity
Uniform resource locators
title Seeking Nonsense, Looking for Trouble: Efficient Promotional-Infection Detection through Semantic Inconsistency Search
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T13%3A16%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Seeking%20Nonsense,%20Looking%20for%20Trouble:%20Efficient%20Promotional-Infection%20Detection%20through%20Semantic%20Inconsistency%20Search&rft.btitle=2016%20IEEE%20Symposium%20on%20Security%20and%20Privacy%20(SP)&rft.au=Xiaojing%20Liao&rft.date=2016-05-01&rft.spage=707&rft.epage=723&rft.pages=707-723&rft.eissn=2375-1207&rft.coden=IEEPAD&rft_id=info:doi/10.1109/SP.2016.48&rft.eisbn=1509008241&rft.eisbn_list=9781509008247&rft_dat=%3Cproquest_CHZPO%3E1835566662%3C/proquest_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i208t-63a400786ac5c580d0b79bedffc5e0175360dedf03c7e9643065826d64fe63483%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1835566662&rft_id=info:pmid/&rft_ieee_id=7546531&rfr_iscdi=true