Loading…

BERT- and CNN-based TOBEAT approach for unwelcome tweets detection

Social media platforms have become an inevitable part of our lives today. Twitter is one such major online social networking platform. Recently, it pointed to exponential growth with growing interest from registered users. This popularity attracts cybercriminals (or spammers) to spread malware and a...

Full description

Saved in:

Bibliographic Details
Published in:	Social network analysis and mining 2022-12, Vol.12 (1), p.144, Article 144
Main Authors:	Ouni, Sarra, Fkih, Fethi, Omri, Mohamed Nazih
Format:	Article
Language:	English
Subjects:	Advertisements Applications of Graph Theory and Complex Networks Artificial neural networks Bidirectionality Classifiers Comparative studies Computer Science Crime prevention Cybercrime Data collection Data Mining and Knowledge Discovery Economics Electronic publishing Extraction Game Theory Humanities Law Machine learning Methodology of the Social Sciences Networking Neural networks Original Article Popularity Semantics Social and Behav. Sciences Social media Social networks Spamming Statistics for Social Sciences Topics User behavior
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Social media platforms have become an inevitable part of our lives today. Twitter is one such major online social networking platform. Recently, it pointed to exponential growth with growing interest from registered users. This popularity attracts cybercriminals (or spammers) to spread malware and advertisements via links shared in tweets, and hijack hot topics to get the attention of legitimate users. Instead, these spammers send violent messages known as spam, also known as junk e-mail, and spread other malicious activity. Spam on Twitter has become an inescapable problem that must be solved. In this context, several solutions have been proposed to reveal the problem of Twitter spam. However, the main existing proposed methods suffer from many limitations and cannot perfectly detect spammers on social networks. In this paper, we propose a new approach that considers the extraction of new TOpics-Based fEAtures (TOBEAT), from Twitter data. Our approach is based on BERT (bidirectional encoder representations of transformers) and CNN (convolutional neural network). To implement our solution, a new framework was developed to combine topic-based features with contextual BERT embeddings. The obtained final features vector is then fed into the supervised classifier for classification. The experimental results, performed on a Twitter data collection, show that CNN is the most suitable classifier to solve the spam filtering task. Moreover, the analysis of the results of the comparative study shows that by using the Twitter data set, our approach outperforms the previously published approaches and achieves 94.97%, 94.05%, 95.88%, 94.95% and 94.92% in accuracy , precision , recall , F 1 - s c o r e , and G - m e a n , respectively. In terms of time consumption, our approach recorded a time of 0.5164 seconds per training step. In percentage terms, this represents a gain of 82% compared to the TOBEAT-BERT+SVM model, 76.1% compared to the TOBEAT-BERT+NB model, and 70% compared to the TOBEAT-BERT+RF model.
ISSN:	1869-5450 1869-5469
DOI:	10.1007/s13278-022-00970-0