Loading…

Few are as Good as Many: An Ontology-Based Tweet Spam Detection Approach

Due to the high popularity of Twitter, spammers tend to favor its use in spreading their commercial messages. In the context of detecting twitter spams, different statistical and behavioral analysis approaches were proposed. However, these techniques suffer from many limitations due to: 1) ongoing c...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2018, Vol.6, p.63890-63904
Main Authors: Halawi, Bahia, Mourad, Azzam, Otrok, Hadi, Damiani, Ernesto
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Due to the high popularity of Twitter, spammers tend to favor its use in spreading their commercial messages. In the context of detecting twitter spams, different statistical and behavioral analysis approaches were proposed. However, these techniques suffer from many limitations due to: 1) ongoing changes to Twitter's streaming API which constrains access to a user's list of followers/followees; 2) spammer's creativity in building diverse messages; 3) use of embedded links and new accounts; and 4) need for analyzing different characteristics about users without their consent. To address the aforementioned challenges, we propose a novel ontology-based approach for spam detection over Twitter during events by analyzing the relationship between ham user tweets versus spams. Our approach relies solely on public tweet messages while performing the analysis and classification tasks. In this context, ontologies are derived and used to generate a dictionary that validates real tweet messages from random topics. Similarity ratio among the dictionary and tweets is used to reflect the legitimacy of the messages. Experiments conducted on real tweet data illustrate that message-to-message techniques achieved a low detection rate compared with our ontology-based approach which outperforms them by approximately 200%, in addition to promising scalability for large data analysis.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2018.2877685