Loading…
BERT- and CNN-based TOBEAT approach for unwelcome tweets detection
Social media platforms have become an inevitable part of our lives today. Twitter is one such major online social networking platform. Recently, it pointed to exponential growth with growing interest from registered users. This popularity attracts cybercriminals (or spammers) to spread malware and a...
Saved in:
Published in: | Social network analysis and mining 2022-12, Vol.12 (1), p.144, Article 144 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Social media platforms have become an inevitable part of our lives today. Twitter is one such major online social networking platform. Recently, it pointed to exponential growth with growing interest from registered users. This popularity attracts cybercriminals (or spammers) to spread malware and advertisements via links shared in tweets, and hijack hot topics to get the attention of legitimate users. Instead, these spammers send violent messages known as spam, also known as junk e-mail, and spread other malicious activity. Spam on Twitter has become an inescapable problem that must be solved. In this context, several solutions have been proposed to reveal the problem of Twitter spam. However, the main existing proposed methods suffer from many limitations and cannot perfectly detect spammers on social networks. In this paper, we propose a new approach that considers the extraction of new TOpics-Based fEAtures (TOBEAT), from Twitter data. Our approach is based on BERT (bidirectional encoder representations of transformers) and CNN (convolutional neural network). To implement our solution, a new framework was developed to combine topic-based features with contextual BERT embeddings. The obtained final features vector is then fed into the supervised classifier for classification. The experimental results, performed on a Twitter data collection, show that CNN is the most suitable classifier to solve the spam filtering task. Moreover, the analysis of the results of the comparative study shows that by using the Twitter data set, our approach outperforms the previously published approaches and achieves 94.97%, 94.05%, 95.88%, 94.95% and 94.92% in
accuracy
,
precision
,
recall
,
F
1
-
s
c
o
r
e
, and
G
-
m
e
a
n
, respectively. In terms of time consumption, our approach recorded a time of 0.5164 seconds per training step. In percentage terms, this represents a gain of 82% compared to the TOBEAT-BERT+SVM model, 76.1% compared to the TOBEAT-BERT+NB model, and 70% compared to the TOBEAT-BERT+RF model. |
---|---|
ISSN: | 1869-5450 1869-5469 |
DOI: | 10.1007/s13278-022-00970-0 |