Loading…

BERT- and CNN-based TOBEAT approach for unwelcome tweets detection

Social media platforms have become an inevitable part of our lives today. Twitter is one such major online social networking platform. Recently, it pointed to exponential growth with growing interest from registered users. This popularity attracts cybercriminals (or spammers) to spread malware and a...

Full description

Saved in:
Bibliographic Details
Published in:Social network analysis and mining 2022-12, Vol.12 (1), p.144, Article 144
Main Authors: Ouni, Sarra, Fkih, Fethi, Omri, Mohamed Nazih
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c319t-c0279a59780a2bbe618a0a75cf29f5894179e69f44d9adc9fa5c54348c76c383
cites cdi_FETCH-LOGICAL-c319t-c0279a59780a2bbe618a0a75cf29f5894179e69f44d9adc9fa5c54348c76c383
container_end_page
container_issue 1
container_start_page 144
container_title Social network analysis and mining
container_volume 12
creator Ouni, Sarra
Fkih, Fethi
Omri, Mohamed Nazih
description Social media platforms have become an inevitable part of our lives today. Twitter is one such major online social networking platform. Recently, it pointed to exponential growth with growing interest from registered users. This popularity attracts cybercriminals (or spammers) to spread malware and advertisements via links shared in tweets, and hijack hot topics to get the attention of legitimate users. Instead, these spammers send violent messages known as spam, also known as junk e-mail, and spread other malicious activity. Spam on Twitter has become an inescapable problem that must be solved. In this context, several solutions have been proposed to reveal the problem of Twitter spam. However, the main existing proposed methods suffer from many limitations and cannot perfectly detect spammers on social networks. In this paper, we propose a new approach that considers the extraction of new TOpics-Based fEAtures (TOBEAT), from Twitter data. Our approach is based on BERT (bidirectional encoder representations of transformers) and CNN (convolutional neural network). To implement our solution, a new framework was developed to combine topic-based features with contextual BERT embeddings. The obtained final features vector is then fed into the supervised classifier for classification. The experimental results, performed on a Twitter data collection, show that CNN is the most suitable classifier to solve the spam filtering task. Moreover, the analysis of the results of the comparative study shows that by using the Twitter data set, our approach outperforms the previously published approaches and achieves 94.97%, 94.05%, 95.88%, 94.95% and 94.92% in accuracy , precision , recall , F 1 - s c o r e , and G - m e a n , respectively. In terms of time consumption, our approach recorded a time of 0.5164 seconds per training step. In percentage terms, this represents a gain of 82% compared to the TOBEAT-BERT+SVM model, 76.1% compared to the TOBEAT-BERT+NB model, and 70% compared to the TOBEAT-BERT+RF model.
doi_str_mv 10.1007/s13278-022-00970-0
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2920073714</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2920073714</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-c0279a59780a2bbe618a0a75cf29f5894179e69f44d9adc9fa5c54348c76c383</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWGr_gKeA5-jka5Mc21I_QFqQvYc0m2hLu1uTLcV_b3RFb55mDu_zzvAgdE3hlgKou0w5U5oAYwTAKCBwhkZUV4ZIUZnz313CJZrkvAUACpwbqEZoNlu81AS7tsHz5ZKsXQ4NrlezxbTG7nBInfNvOHYJH9tT2PluH3B_CqHPuAl98P2ma6_QRXS7HCY_c4zq-0U9fyTPq4en-fSZeE5NTzwwZZw0SoNj63WoqHbglPSRmSi1EVSZUJkoRGNc40100kvBhfaq8lzzMboZastT78eQe7vtjqktFy0zrHjgioqSYkPKpy7nFKI9pM3epQ9LwX7ZsoMtW2zZb1sWCsQHKJdw-xrSX_U_1CcbmGoh</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2920073714</pqid></control><display><type>article</type><title>BERT- and CNN-based TOBEAT approach for unwelcome tweets detection</title><source>International Bibliography of the Social Sciences (IBSS)</source><source>Social Science Premium Collection (Proquest) (PQ_SDU_P3)</source><source>Springer Link</source><creator>Ouni, Sarra ; Fkih, Fethi ; Omri, Mohamed Nazih</creator><creatorcontrib>Ouni, Sarra ; Fkih, Fethi ; Omri, Mohamed Nazih</creatorcontrib><description>Social media platforms have become an inevitable part of our lives today. Twitter is one such major online social networking platform. Recently, it pointed to exponential growth with growing interest from registered users. This popularity attracts cybercriminals (or spammers) to spread malware and advertisements via links shared in tweets, and hijack hot topics to get the attention of legitimate users. Instead, these spammers send violent messages known as spam, also known as junk e-mail, and spread other malicious activity. Spam on Twitter has become an inescapable problem that must be solved. In this context, several solutions have been proposed to reveal the problem of Twitter spam. However, the main existing proposed methods suffer from many limitations and cannot perfectly detect spammers on social networks. In this paper, we propose a new approach that considers the extraction of new TOpics-Based fEAtures (TOBEAT), from Twitter data. Our approach is based on BERT (bidirectional encoder representations of transformers) and CNN (convolutional neural network). To implement our solution, a new framework was developed to combine topic-based features with contextual BERT embeddings. The obtained final features vector is then fed into the supervised classifier for classification. The experimental results, performed on a Twitter data collection, show that CNN is the most suitable classifier to solve the spam filtering task. Moreover, the analysis of the results of the comparative study shows that by using the Twitter data set, our approach outperforms the previously published approaches and achieves 94.97%, 94.05%, 95.88%, 94.95% and 94.92% in accuracy , precision , recall , F 1 - s c o r e , and G - m e a n , respectively. In terms of time consumption, our approach recorded a time of 0.5164 seconds per training step. In percentage terms, this represents a gain of 82% compared to the TOBEAT-BERT+SVM model, 76.1% compared to the TOBEAT-BERT+NB model, and 70% compared to the TOBEAT-BERT+RF model.</description><identifier>ISSN: 1869-5450</identifier><identifier>EISSN: 1869-5469</identifier><identifier>DOI: 10.1007/s13278-022-00970-0</identifier><language>eng</language><publisher>Vienna: Springer Vienna</publisher><subject>Advertisements ; Applications of Graph Theory and Complex Networks ; Artificial neural networks ; Bidirectionality ; Classifiers ; Comparative studies ; Computer Science ; Crime prevention ; Cybercrime ; Data collection ; Data Mining and Knowledge Discovery ; Economics ; Electronic publishing ; Extraction ; Game Theory ; Humanities ; Law ; Machine learning ; Methodology of the Social Sciences ; Networking ; Neural networks ; Original Article ; Popularity ; Semantics ; Social and Behav. Sciences ; Social media ; Social networks ; Spamming ; Statistics for Social Sciences ; Topics ; User behavior</subject><ispartof>Social network analysis and mining, 2022-12, Vol.12 (1), p.144, Article 144</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature 2022. Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-c0279a59780a2bbe618a0a75cf29f5894179e69f44d9adc9fa5c54348c76c383</citedby><cites>FETCH-LOGICAL-c319t-c0279a59780a2bbe618a0a75cf29f5894179e69f44d9adc9fa5c54348c76c383</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2920073714?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,12846,21393,27923,27924,33222,33610,43732</link.rule.ids></links><search><creatorcontrib>Ouni, Sarra</creatorcontrib><creatorcontrib>Fkih, Fethi</creatorcontrib><creatorcontrib>Omri, Mohamed Nazih</creatorcontrib><title>BERT- and CNN-based TOBEAT approach for unwelcome tweets detection</title><title>Social network analysis and mining</title><addtitle>Soc. Netw. Anal. Min</addtitle><description>Social media platforms have become an inevitable part of our lives today. Twitter is one such major online social networking platform. Recently, it pointed to exponential growth with growing interest from registered users. This popularity attracts cybercriminals (or spammers) to spread malware and advertisements via links shared in tweets, and hijack hot topics to get the attention of legitimate users. Instead, these spammers send violent messages known as spam, also known as junk e-mail, and spread other malicious activity. Spam on Twitter has become an inescapable problem that must be solved. In this context, several solutions have been proposed to reveal the problem of Twitter spam. However, the main existing proposed methods suffer from many limitations and cannot perfectly detect spammers on social networks. In this paper, we propose a new approach that considers the extraction of new TOpics-Based fEAtures (TOBEAT), from Twitter data. Our approach is based on BERT (bidirectional encoder representations of transformers) and CNN (convolutional neural network). To implement our solution, a new framework was developed to combine topic-based features with contextual BERT embeddings. The obtained final features vector is then fed into the supervised classifier for classification. The experimental results, performed on a Twitter data collection, show that CNN is the most suitable classifier to solve the spam filtering task. Moreover, the analysis of the results of the comparative study shows that by using the Twitter data set, our approach outperforms the previously published approaches and achieves 94.97%, 94.05%, 95.88%, 94.95% and 94.92% in accuracy , precision , recall , F 1 - s c o r e , and G - m e a n , respectively. In terms of time consumption, our approach recorded a time of 0.5164 seconds per training step. In percentage terms, this represents a gain of 82% compared to the TOBEAT-BERT+SVM model, 76.1% compared to the TOBEAT-BERT+NB model, and 70% compared to the TOBEAT-BERT+RF model.</description><subject>Advertisements</subject><subject>Applications of Graph Theory and Complex Networks</subject><subject>Artificial neural networks</subject><subject>Bidirectionality</subject><subject>Classifiers</subject><subject>Comparative studies</subject><subject>Computer Science</subject><subject>Crime prevention</subject><subject>Cybercrime</subject><subject>Data collection</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Economics</subject><subject>Electronic publishing</subject><subject>Extraction</subject><subject>Game Theory</subject><subject>Humanities</subject><subject>Law</subject><subject>Machine learning</subject><subject>Methodology of the Social Sciences</subject><subject>Networking</subject><subject>Neural networks</subject><subject>Original Article</subject><subject>Popularity</subject><subject>Semantics</subject><subject>Social and Behav. Sciences</subject><subject>Social media</subject><subject>Social networks</subject><subject>Spamming</subject><subject>Statistics for Social Sciences</subject><subject>Topics</subject><subject>User behavior</subject><issn>1869-5450</issn><issn>1869-5469</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>8BJ</sourceid><sourceid>ALSLI</sourceid><sourceid>M2R</sourceid><recordid>eNp9kE1LAzEQhoMoWGr_gKeA5-jka5Mc21I_QFqQvYc0m2hLu1uTLcV_b3RFb55mDu_zzvAgdE3hlgKou0w5U5oAYwTAKCBwhkZUV4ZIUZnz313CJZrkvAUACpwbqEZoNlu81AS7tsHz5ZKsXQ4NrlezxbTG7nBInfNvOHYJH9tT2PluH3B_CqHPuAl98P2ma6_QRXS7HCY_c4zq-0U9fyTPq4en-fSZeE5NTzwwZZw0SoNj63WoqHbglPSRmSi1EVSZUJkoRGNc40100kvBhfaq8lzzMboZastT78eQe7vtjqktFy0zrHjgioqSYkPKpy7nFKI9pM3epQ9LwX7ZsoMtW2zZb1sWCsQHKJdw-xrSX_U_1CcbmGoh</recordid><startdate>20221201</startdate><enddate>20221201</enddate><creator>Ouni, Sarra</creator><creator>Fkih, Fethi</creator><creator>Omri, Mohamed Nazih</creator><general>Springer Vienna</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>0-V</scope><scope>3V.</scope><scope>7XB</scope><scope>88J</scope><scope>8BJ</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FQK</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JBE</scope><scope>JQ2</scope><scope>K7-</scope><scope>M2R</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope></search><sort><creationdate>20221201</creationdate><title>BERT- and CNN-based TOBEAT approach for unwelcome tweets detection</title><author>Ouni, Sarra ; Fkih, Fethi ; Omri, Mohamed Nazih</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-c0279a59780a2bbe618a0a75cf29f5894179e69f44d9adc9fa5c54348c76c383</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Advertisements</topic><topic>Applications of Graph Theory and Complex Networks</topic><topic>Artificial neural networks</topic><topic>Bidirectionality</topic><topic>Classifiers</topic><topic>Comparative studies</topic><topic>Computer Science</topic><topic>Crime prevention</topic><topic>Cybercrime</topic><topic>Data collection</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Economics</topic><topic>Electronic publishing</topic><topic>Extraction</topic><topic>Game Theory</topic><topic>Humanities</topic><topic>Law</topic><topic>Machine learning</topic><topic>Methodology of the Social Sciences</topic><topic>Networking</topic><topic>Neural networks</topic><topic>Original Article</topic><topic>Popularity</topic><topic>Semantics</topic><topic>Social and Behav. Sciences</topic><topic>Social media</topic><topic>Social networks</topic><topic>Spamming</topic><topic>Statistics for Social Sciences</topic><topic>Topics</topic><topic>User behavior</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ouni, Sarra</creatorcontrib><creatorcontrib>Fkih, Fethi</creatorcontrib><creatorcontrib>Omri, Mohamed Nazih</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Social Sciences Premium Collection【Remote access available】</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Social Science Database (Alumni Edition)</collection><collection>International Bibliography of the Social Sciences (IBSS)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central</collection><collection>Social Science Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>Advanced Technologies &amp; Aerospace Database‎ (1962 - current)</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Social Science Database (ProQuest)</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><jtitle>Social network analysis and mining</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ouni, Sarra</au><au>Fkih, Fethi</au><au>Omri, Mohamed Nazih</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>BERT- and CNN-based TOBEAT approach for unwelcome tweets detection</atitle><jtitle>Social network analysis and mining</jtitle><stitle>Soc. Netw. Anal. Min</stitle><date>2022-12-01</date><risdate>2022</risdate><volume>12</volume><issue>1</issue><spage>144</spage><pages>144-</pages><artnum>144</artnum><issn>1869-5450</issn><eissn>1869-5469</eissn><abstract>Social media platforms have become an inevitable part of our lives today. Twitter is one such major online social networking platform. Recently, it pointed to exponential growth with growing interest from registered users. This popularity attracts cybercriminals (or spammers) to spread malware and advertisements via links shared in tweets, and hijack hot topics to get the attention of legitimate users. Instead, these spammers send violent messages known as spam, also known as junk e-mail, and spread other malicious activity. Spam on Twitter has become an inescapable problem that must be solved. In this context, several solutions have been proposed to reveal the problem of Twitter spam. However, the main existing proposed methods suffer from many limitations and cannot perfectly detect spammers on social networks. In this paper, we propose a new approach that considers the extraction of new TOpics-Based fEAtures (TOBEAT), from Twitter data. Our approach is based on BERT (bidirectional encoder representations of transformers) and CNN (convolutional neural network). To implement our solution, a new framework was developed to combine topic-based features with contextual BERT embeddings. The obtained final features vector is then fed into the supervised classifier for classification. The experimental results, performed on a Twitter data collection, show that CNN is the most suitable classifier to solve the spam filtering task. Moreover, the analysis of the results of the comparative study shows that by using the Twitter data set, our approach outperforms the previously published approaches and achieves 94.97%, 94.05%, 95.88%, 94.95% and 94.92% in accuracy , precision , recall , F 1 - s c o r e , and G - m e a n , respectively. In terms of time consumption, our approach recorded a time of 0.5164 seconds per training step. In percentage terms, this represents a gain of 82% compared to the TOBEAT-BERT+SVM model, 76.1% compared to the TOBEAT-BERT+NB model, and 70% compared to the TOBEAT-BERT+RF model.</abstract><cop>Vienna</cop><pub>Springer Vienna</pub><doi>10.1007/s13278-022-00970-0</doi></addata></record>
fulltext fulltext
identifier ISSN: 1869-5450
ispartof Social network analysis and mining, 2022-12, Vol.12 (1), p.144, Article 144
issn 1869-5450
1869-5469
language eng
recordid cdi_proquest_journals_2920073714
source International Bibliography of the Social Sciences (IBSS); Social Science Premium Collection (Proquest) (PQ_SDU_P3); Springer Link
subjects Advertisements
Applications of Graph Theory and Complex Networks
Artificial neural networks
Bidirectionality
Classifiers
Comparative studies
Computer Science
Crime prevention
Cybercrime
Data collection
Data Mining and Knowledge Discovery
Economics
Electronic publishing
Extraction
Game Theory
Humanities
Law
Machine learning
Methodology of the Social Sciences
Networking
Neural networks
Original Article
Popularity
Semantics
Social and Behav. Sciences
Social media
Social networks
Spamming
Statistics for Social Sciences
Topics
User behavior
title BERT- and CNN-based TOBEAT approach for unwelcome tweets detection
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T15%3A34%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=BERT-%20and%20CNN-based%20TOBEAT%20approach%20for%20unwelcome%20tweets%20detection&rft.jtitle=Social%20network%20analysis%20and%20mining&rft.au=Ouni,%20Sarra&rft.date=2022-12-01&rft.volume=12&rft.issue=1&rft.spage=144&rft.pages=144-&rft.artnum=144&rft.issn=1869-5450&rft.eissn=1869-5469&rft_id=info:doi/10.1007/s13278-022-00970-0&rft_dat=%3Cproquest_cross%3E2920073714%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c319t-c0279a59780a2bbe618a0a75cf29f5894179e69f44d9adc9fa5c54348c76c383%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2920073714&rft_id=info:pmid/&rfr_iscdi=true