Loading…
BERT- and CNN-based TOBEAT approach for unwelcome tweets detection
Social media platforms have become an inevitable part of our lives today. Twitter is one such major online social networking platform. Recently, it pointed to exponential growth with growing interest from registered users. This popularity attracts cybercriminals (or spammers) to spread malware and a...
Saved in:
Published in: | Social network analysis and mining 2022-12, Vol.12 (1), p.144, Article 144 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c319t-c0279a59780a2bbe618a0a75cf29f5894179e69f44d9adc9fa5c54348c76c383 |
---|---|
cites | cdi_FETCH-LOGICAL-c319t-c0279a59780a2bbe618a0a75cf29f5894179e69f44d9adc9fa5c54348c76c383 |
container_end_page | |
container_issue | 1 |
container_start_page | 144 |
container_title | Social network analysis and mining |
container_volume | 12 |
creator | Ouni, Sarra Fkih, Fethi Omri, Mohamed Nazih |
description | Social media platforms have become an inevitable part of our lives today. Twitter is one such major online social networking platform. Recently, it pointed to exponential growth with growing interest from registered users. This popularity attracts cybercriminals (or spammers) to spread malware and advertisements via links shared in tweets, and hijack hot topics to get the attention of legitimate users. Instead, these spammers send violent messages known as spam, also known as junk e-mail, and spread other malicious activity. Spam on Twitter has become an inescapable problem that must be solved. In this context, several solutions have been proposed to reveal the problem of Twitter spam. However, the main existing proposed methods suffer from many limitations and cannot perfectly detect spammers on social networks. In this paper, we propose a new approach that considers the extraction of new TOpics-Based fEAtures (TOBEAT), from Twitter data. Our approach is based on BERT (bidirectional encoder representations of transformers) and CNN (convolutional neural network). To implement our solution, a new framework was developed to combine topic-based features with contextual BERT embeddings. The obtained final features vector is then fed into the supervised classifier for classification. The experimental results, performed on a Twitter data collection, show that CNN is the most suitable classifier to solve the spam filtering task. Moreover, the analysis of the results of the comparative study shows that by using the Twitter data set, our approach outperforms the previously published approaches and achieves 94.97%, 94.05%, 95.88%, 94.95% and 94.92% in
accuracy
,
precision
,
recall
,
F
1
-
s
c
o
r
e
, and
G
-
m
e
a
n
, respectively. In terms of time consumption, our approach recorded a time of 0.5164 seconds per training step. In percentage terms, this represents a gain of 82% compared to the TOBEAT-BERT+SVM model, 76.1% compared to the TOBEAT-BERT+NB model, and 70% compared to the TOBEAT-BERT+RF model. |
doi_str_mv | 10.1007/s13278-022-00970-0 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2920073714</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2920073714</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-c0279a59780a2bbe618a0a75cf29f5894179e69f44d9adc9fa5c54348c76c383</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWGr_gKeA5-jka5Mc21I_QFqQvYc0m2hLu1uTLcV_b3RFb55mDu_zzvAgdE3hlgKou0w5U5oAYwTAKCBwhkZUV4ZIUZnz313CJZrkvAUACpwbqEZoNlu81AS7tsHz5ZKsXQ4NrlezxbTG7nBInfNvOHYJH9tT2PluH3B_CqHPuAl98P2ma6_QRXS7HCY_c4zq-0U9fyTPq4en-fSZeE5NTzwwZZw0SoNj63WoqHbglPSRmSi1EVSZUJkoRGNc40100kvBhfaq8lzzMboZastT78eQe7vtjqktFy0zrHjgioqSYkPKpy7nFKI9pM3epQ9LwX7ZsoMtW2zZb1sWCsQHKJdw-xrSX_U_1CcbmGoh</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2920073714</pqid></control><display><type>article</type><title>BERT- and CNN-based TOBEAT approach for unwelcome tweets detection</title><source>International Bibliography of the Social Sciences (IBSS)</source><source>Social Science Premium Collection (Proquest) (PQ_SDU_P3)</source><source>Springer Link</source><creator>Ouni, Sarra ; Fkih, Fethi ; Omri, Mohamed Nazih</creator><creatorcontrib>Ouni, Sarra ; Fkih, Fethi ; Omri, Mohamed Nazih</creatorcontrib><description>Social media platforms have become an inevitable part of our lives today. Twitter is one such major online social networking platform. Recently, it pointed to exponential growth with growing interest from registered users. This popularity attracts cybercriminals (or spammers) to spread malware and advertisements via links shared in tweets, and hijack hot topics to get the attention of legitimate users. Instead, these spammers send violent messages known as spam, also known as junk e-mail, and spread other malicious activity. Spam on Twitter has become an inescapable problem that must be solved. In this context, several solutions have been proposed to reveal the problem of Twitter spam. However, the main existing proposed methods suffer from many limitations and cannot perfectly detect spammers on social networks. In this paper, we propose a new approach that considers the extraction of new TOpics-Based fEAtures (TOBEAT), from Twitter data. Our approach is based on BERT (bidirectional encoder representations of transformers) and CNN (convolutional neural network). To implement our solution, a new framework was developed to combine topic-based features with contextual BERT embeddings. The obtained final features vector is then fed into the supervised classifier for classification. The experimental results, performed on a Twitter data collection, show that CNN is the most suitable classifier to solve the spam filtering task. Moreover, the analysis of the results of the comparative study shows that by using the Twitter data set, our approach outperforms the previously published approaches and achieves 94.97%, 94.05%, 95.88%, 94.95% and 94.92% in
accuracy
,
precision
,
recall
,
F
1
-
s
c
o
r
e
, and
G
-
m
e
a
n
, respectively. In terms of time consumption, our approach recorded a time of 0.5164 seconds per training step. In percentage terms, this represents a gain of 82% compared to the TOBEAT-BERT+SVM model, 76.1% compared to the TOBEAT-BERT+NB model, and 70% compared to the TOBEAT-BERT+RF model.</description><identifier>ISSN: 1869-5450</identifier><identifier>EISSN: 1869-5469</identifier><identifier>DOI: 10.1007/s13278-022-00970-0</identifier><language>eng</language><publisher>Vienna: Springer Vienna</publisher><subject>Advertisements ; Applications of Graph Theory and Complex Networks ; Artificial neural networks ; Bidirectionality ; Classifiers ; Comparative studies ; Computer Science ; Crime prevention ; Cybercrime ; Data collection ; Data Mining and Knowledge Discovery ; Economics ; Electronic publishing ; Extraction ; Game Theory ; Humanities ; Law ; Machine learning ; Methodology of the Social Sciences ; Networking ; Neural networks ; Original Article ; Popularity ; Semantics ; Social and Behav. Sciences ; Social media ; Social networks ; Spamming ; Statistics for Social Sciences ; Topics ; User behavior</subject><ispartof>Social network analysis and mining, 2022-12, Vol.12 (1), p.144, Article 144</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature 2022. Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-c0279a59780a2bbe618a0a75cf29f5894179e69f44d9adc9fa5c54348c76c383</citedby><cites>FETCH-LOGICAL-c319t-c0279a59780a2bbe618a0a75cf29f5894179e69f44d9adc9fa5c54348c76c383</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2920073714?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,12846,21393,27923,27924,33222,33610,43732</link.rule.ids></links><search><creatorcontrib>Ouni, Sarra</creatorcontrib><creatorcontrib>Fkih, Fethi</creatorcontrib><creatorcontrib>Omri, Mohamed Nazih</creatorcontrib><title>BERT- and CNN-based TOBEAT approach for unwelcome tweets detection</title><title>Social network analysis and mining</title><addtitle>Soc. Netw. Anal. Min</addtitle><description>Social media platforms have become an inevitable part of our lives today. Twitter is one such major online social networking platform. Recently, it pointed to exponential growth with growing interest from registered users. This popularity attracts cybercriminals (or spammers) to spread malware and advertisements via links shared in tweets, and hijack hot topics to get the attention of legitimate users. Instead, these spammers send violent messages known as spam, also known as junk e-mail, and spread other malicious activity. Spam on Twitter has become an inescapable problem that must be solved. In this context, several solutions have been proposed to reveal the problem of Twitter spam. However, the main existing proposed methods suffer from many limitations and cannot perfectly detect spammers on social networks. In this paper, we propose a new approach that considers the extraction of new TOpics-Based fEAtures (TOBEAT), from Twitter data. Our approach is based on BERT (bidirectional encoder representations of transformers) and CNN (convolutional neural network). To implement our solution, a new framework was developed to combine topic-based features with contextual BERT embeddings. The obtained final features vector is then fed into the supervised classifier for classification. The experimental results, performed on a Twitter data collection, show that CNN is the most suitable classifier to solve the spam filtering task. Moreover, the analysis of the results of the comparative study shows that by using the Twitter data set, our approach outperforms the previously published approaches and achieves 94.97%, 94.05%, 95.88%, 94.95% and 94.92% in
accuracy
,
precision
,
recall
,
F
1
-
s
c
o
r
e
, and
G
-
m
e
a
n
, respectively. In terms of time consumption, our approach recorded a time of 0.5164 seconds per training step. In percentage terms, this represents a gain of 82% compared to the TOBEAT-BERT+SVM model, 76.1% compared to the TOBEAT-BERT+NB model, and 70% compared to the TOBEAT-BERT+RF model.</description><subject>Advertisements</subject><subject>Applications of Graph Theory and Complex Networks</subject><subject>Artificial neural networks</subject><subject>Bidirectionality</subject><subject>Classifiers</subject><subject>Comparative studies</subject><subject>Computer Science</subject><subject>Crime prevention</subject><subject>Cybercrime</subject><subject>Data collection</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Economics</subject><subject>Electronic publishing</subject><subject>Extraction</subject><subject>Game Theory</subject><subject>Humanities</subject><subject>Law</subject><subject>Machine learning</subject><subject>Methodology of the Social Sciences</subject><subject>Networking</subject><subject>Neural networks</subject><subject>Original Article</subject><subject>Popularity</subject><subject>Semantics</subject><subject>Social and Behav. Sciences</subject><subject>Social media</subject><subject>Social networks</subject><subject>Spamming</subject><subject>Statistics for Social Sciences</subject><subject>Topics</subject><subject>User behavior</subject><issn>1869-5450</issn><issn>1869-5469</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>8BJ</sourceid><sourceid>ALSLI</sourceid><sourceid>M2R</sourceid><recordid>eNp9kE1LAzEQhoMoWGr_gKeA5-jka5Mc21I_QFqQvYc0m2hLu1uTLcV_b3RFb55mDu_zzvAgdE3hlgKou0w5U5oAYwTAKCBwhkZUV4ZIUZnz313CJZrkvAUACpwbqEZoNlu81AS7tsHz5ZKsXQ4NrlezxbTG7nBInfNvOHYJH9tT2PluH3B_CqHPuAl98P2ma6_QRXS7HCY_c4zq-0U9fyTPq4en-fSZeE5NTzwwZZw0SoNj63WoqHbglPSRmSi1EVSZUJkoRGNc40100kvBhfaq8lzzMboZastT78eQe7vtjqktFy0zrHjgioqSYkPKpy7nFKI9pM3epQ9LwX7ZsoMtW2zZb1sWCsQHKJdw-xrSX_U_1CcbmGoh</recordid><startdate>20221201</startdate><enddate>20221201</enddate><creator>Ouni, Sarra</creator><creator>Fkih, Fethi</creator><creator>Omri, Mohamed Nazih</creator><general>Springer Vienna</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>0-V</scope><scope>3V.</scope><scope>7XB</scope><scope>88J</scope><scope>8BJ</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FQK</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JBE</scope><scope>JQ2</scope><scope>K7-</scope><scope>M2R</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope></search><sort><creationdate>20221201</creationdate><title>BERT- and CNN-based TOBEAT approach for unwelcome tweets detection</title><author>Ouni, Sarra ; Fkih, Fethi ; Omri, Mohamed Nazih</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-c0279a59780a2bbe618a0a75cf29f5894179e69f44d9adc9fa5c54348c76c383</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Advertisements</topic><topic>Applications of Graph Theory and Complex Networks</topic><topic>Artificial neural networks</topic><topic>Bidirectionality</topic><topic>Classifiers</topic><topic>Comparative studies</topic><topic>Computer Science</topic><topic>Crime prevention</topic><topic>Cybercrime</topic><topic>Data collection</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Economics</topic><topic>Electronic publishing</topic><topic>Extraction</topic><topic>Game Theory</topic><topic>Humanities</topic><topic>Law</topic><topic>Machine learning</topic><topic>Methodology of the Social Sciences</topic><topic>Networking</topic><topic>Neural networks</topic><topic>Original Article</topic><topic>Popularity</topic><topic>Semantics</topic><topic>Social and Behav. Sciences</topic><topic>Social media</topic><topic>Social networks</topic><topic>Spamming</topic><topic>Statistics for Social Sciences</topic><topic>Topics</topic><topic>User behavior</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ouni, Sarra</creatorcontrib><creatorcontrib>Fkih, Fethi</creatorcontrib><creatorcontrib>Omri, Mohamed Nazih</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Social Sciences Premium Collection【Remote access available】</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Social Science Database (Alumni Edition)</collection><collection>International Bibliography of the Social Sciences (IBSS)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central</collection><collection>Social Science Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>Advanced Technologies & Aerospace Database (1962 - current)</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Social Science Database (ProQuest)</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><jtitle>Social network analysis and mining</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ouni, Sarra</au><au>Fkih, Fethi</au><au>Omri, Mohamed Nazih</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>BERT- and CNN-based TOBEAT approach for unwelcome tweets detection</atitle><jtitle>Social network analysis and mining</jtitle><stitle>Soc. Netw. Anal. Min</stitle><date>2022-12-01</date><risdate>2022</risdate><volume>12</volume><issue>1</issue><spage>144</spage><pages>144-</pages><artnum>144</artnum><issn>1869-5450</issn><eissn>1869-5469</eissn><abstract>Social media platforms have become an inevitable part of our lives today. Twitter is one such major online social networking platform. Recently, it pointed to exponential growth with growing interest from registered users. This popularity attracts cybercriminals (or spammers) to spread malware and advertisements via links shared in tweets, and hijack hot topics to get the attention of legitimate users. Instead, these spammers send violent messages known as spam, also known as junk e-mail, and spread other malicious activity. Spam on Twitter has become an inescapable problem that must be solved. In this context, several solutions have been proposed to reveal the problem of Twitter spam. However, the main existing proposed methods suffer from many limitations and cannot perfectly detect spammers on social networks. In this paper, we propose a new approach that considers the extraction of new TOpics-Based fEAtures (TOBEAT), from Twitter data. Our approach is based on BERT (bidirectional encoder representations of transformers) and CNN (convolutional neural network). To implement our solution, a new framework was developed to combine topic-based features with contextual BERT embeddings. The obtained final features vector is then fed into the supervised classifier for classification. The experimental results, performed on a Twitter data collection, show that CNN is the most suitable classifier to solve the spam filtering task. Moreover, the analysis of the results of the comparative study shows that by using the Twitter data set, our approach outperforms the previously published approaches and achieves 94.97%, 94.05%, 95.88%, 94.95% and 94.92% in
accuracy
,
precision
,
recall
,
F
1
-
s
c
o
r
e
, and
G
-
m
e
a
n
, respectively. In terms of time consumption, our approach recorded a time of 0.5164 seconds per training step. In percentage terms, this represents a gain of 82% compared to the TOBEAT-BERT+SVM model, 76.1% compared to the TOBEAT-BERT+NB model, and 70% compared to the TOBEAT-BERT+RF model.</abstract><cop>Vienna</cop><pub>Springer Vienna</pub><doi>10.1007/s13278-022-00970-0</doi></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1869-5450 |
ispartof | Social network analysis and mining, 2022-12, Vol.12 (1), p.144, Article 144 |
issn | 1869-5450 1869-5469 |
language | eng |
recordid | cdi_proquest_journals_2920073714 |
source | International Bibliography of the Social Sciences (IBSS); Social Science Premium Collection (Proquest) (PQ_SDU_P3); Springer Link |
subjects | Advertisements Applications of Graph Theory and Complex Networks Artificial neural networks Bidirectionality Classifiers Comparative studies Computer Science Crime prevention Cybercrime Data collection Data Mining and Knowledge Discovery Economics Electronic publishing Extraction Game Theory Humanities Law Machine learning Methodology of the Social Sciences Networking Neural networks Original Article Popularity Semantics Social and Behav. Sciences Social media Social networks Spamming Statistics for Social Sciences Topics User behavior |
title | BERT- and CNN-based TOBEAT approach for unwelcome tweets detection |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T15%3A34%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=BERT-%20and%20CNN-based%20TOBEAT%20approach%20for%20unwelcome%20tweets%20detection&rft.jtitle=Social%20network%20analysis%20and%20mining&rft.au=Ouni,%20Sarra&rft.date=2022-12-01&rft.volume=12&rft.issue=1&rft.spage=144&rft.pages=144-&rft.artnum=144&rft.issn=1869-5450&rft.eissn=1869-5469&rft_id=info:doi/10.1007/s13278-022-00970-0&rft_dat=%3Cproquest_cross%3E2920073714%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c319t-c0279a59780a2bbe618a0a75cf29f5894179e69f44d9adc9fa5c54348c76c383%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2920073714&rft_id=info:pmid/&rfr_iscdi=true |