Loading…

BERT- and CNN-based TOBEAT approach for unwelcome tweets detection

Social media platforms have become an inevitable part of our lives today. Twitter is one such major online social networking platform. Recently, it pointed to exponential growth with growing interest from registered users. This popularity attracts cybercriminals (or spammers) to spread malware and a...

Full description

Saved in:

Bibliographic Details
Published in:	Social network analysis and mining 2022-12, Vol.12 (1), p.144, Article 144
Main Authors:	Ouni, Sarra, Fkih, Fethi, Omri, Mohamed Nazih
Format:	Article
Language:	English
Subjects:	Advertisements Applications of Graph Theory and Complex Networks Artificial neural networks Bidirectionality Classifiers Comparative studies Computer Science Crime prevention Cybercrime Data collection Data Mining and Knowledge Discovery Economics Electronic publishing Extraction Game Theory Humanities Law Machine learning Methodology of the Social Sciences Networking Neural networks Original Article Popularity Semantics Social and Behav. Sciences Social media Social networks Spamming Statistics for Social Sciences Topics User behavior
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c319t-c0279a59780a2bbe618a0a75cf29f5894179e69f44d9adc9fa5c54348c76c383
cites	cdi_FETCH-LOGICAL-c319t-c0279a59780a2bbe618a0a75cf29f5894179e69f44d9adc9fa5c54348c76c383
container_end_page
container_issue	1
container_start_page	144
container_title	Social network analysis and mining
container_volume	12
creator	Ouni, Sarra Fkih, Fethi Omri, Mohamed Nazih
description	Social media platforms have become an inevitable part of our lives today. Twitter is one such major online social networking platform. Recently, it pointed to exponential growth with growing interest from registered users. This popularity attracts cybercriminals (or spammers) to spread malware and advertisements via links shared in tweets, and hijack hot topics to get the attention of legitimate users. Instead, these spammers send violent messages known as spam, also known as junk e-mail, and spread other malicious activity. Spam on Twitter has become an inescapable problem that must be solved. In this context, several solutions have been proposed to reveal the problem of Twitter spam. However, the main existing proposed methods suffer from many limitations and cannot perfectly detect spammers on social networks. In this paper, we propose a new approach that considers the extraction of new TOpics-Based fEAtures (TOBEAT), from Twitter data. Our approach is based on BERT (bidirectional encoder representations of transformers) and CNN (convolutional neural network). To implement our solution, a new framework was developed to combine topic-based features with contextual BERT embeddings. The obtained final features vector is then fed into the supervised classifier for classification. The experimental results, performed on a Twitter data collection, show that CNN is the most suitable classifier to solve the spam filtering task. Moreover, the analysis of the results of the comparative study shows that by using the Twitter data set, our approach outperforms the previously published approaches and achieves 94.97%, 94.05%, 95.88%, 94.95% and 94.92% in accuracy , precision , recall , F 1 - s c o r e , and G - m e a n , respectively. In terms of time consumption, our approach recorded a time of 0.5164 seconds per training step. In percentage terms, this represents a gain of 82% compared to the TOBEAT-BERT+SVM model, 76.1% compared to the TOBEAT-BERT+NB model, and 70% compared to the TOBEAT-BERT+RF model.
doi_str_mv	10.1007/s13278-022-00970-0
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2920073714</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2920073714</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-c0279a59780a2bbe618a0a75cf29f5894179e69f44d9adc9fa5c54348c76c383</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWGr_gKeA5-jka5Mc21I_QFqQvYc0m2hLu1uTLcV_b3RFb55mDu_zzvAgdE3hlgKou0w5U5oAYwTAKCBwhkZUV4ZIUZnz313CJZrkvAUACpwbqEZoNlu81AS7tsHz5ZKsXQ4NrlezxbTG7nBInfNvOHYJH9tT2PluH3B_CqHPuAl98P2ma6_QRXS7HCY_c4zq-0U9fyTPq4en-fSZeE5NTzwwZZw0SoNj63WoqHbglPSRmSi1EVSZUJkoRGNc40100kvBhfaq8lzzMboZastT78eQe7vtjqktFy0zrHjgioqSYkPKpy7nFKI9pM3epQ9LwX7ZsoMtW2zZb1sWCsQHKJdw-xrSX_U_1CcbmGoh</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2920073714</pqid></control><display><type>article</type><title>BERT- and CNN-based TOBEAT approach for unwelcome tweets detection</title><source>International Bibliography of the Social Sciences (IBSS)</source><source>Social Science Premium Collection (Proquest) (PQ_SDU_P3)</source><source>Springer Link</source><creator>Ouni, Sarra ; Fkih, Fethi ; Omri, Mohamed Nazih</creator><creatorcontrib>Ouni, Sarra ; Fkih, Fethi ; Omri, Mohamed Nazih</creatorcontrib><description>Social media platforms have become an inevitable part of our lives today. Twitter is one such major online social networking platform. Recently, it pointed to exponential growth with growing interest from registered users. This popularity attracts cybercriminals (or spammers) to spread malware and advertisements via links shared in tweets, and hijack hot topics to get the attention of legitimate users. Instead, these spammers send violent messages known as spam, also known as junk e-mail, and spread other malicious activity. Spam on Twitter has become an inescapable problem that must be solved. In this context, several solutions have been proposed to reveal the problem of Twitter spam. However, the main existing proposed methods suffer from many limitations and cannot perfectly detect spammers on social networks. In this paper, we propose a new approach that considers the extraction of new TOpics-Based fEAtures (TOBEAT), from Twitter data. Our approach is based on BERT (bidirectional encoder representations of transformers) and CNN (convolutional neural network). To implement our solution, a new framework was developed to combine topic-based features with contextual BERT embeddings. The obtained final features vector is then fed into the supervised classifier for classification. The experimental results, performed on a Twitter data collection, show that CNN is the most suitable classifier to solve the spam filtering task. Moreover, the analysis of the results of the comparative study shows that by using the Twitter data set, our approach outperforms the previously published approaches and achieves 94.97%, 94.05%, 95.88%, 94.95% and 94.92% in accuracy , precision , recall , F 1 - s c o r e , and G - m e a n , respectively. In terms of time consumption, our approach recorded a time of 0.5164 seconds per training step. In percentage terms, this represents a gain of 82% compared to the TOBEAT-BERT+SVM model, 76.1% compared to the TOBEAT-BERT+NB model, and 70% compared to the TOBEAT-BERT+RF model.</description><identifier>ISSN: 1869-5450</identifier><identifier>EISSN: 1869-5469</identifier><identifier>DOI: 10.1007/s13278-022-00970-0</identifier><language>eng</language><publisher>Vienna: Springer Vienna</publisher><subject>Advertisements ; Applications of Graph Theory and Complex Networks ; Artificial neural networks ; Bidirectionality ; Classifiers ; Comparative studies ; Computer Science ; Crime prevention ; Cybercrime ; Data collection ; Data Mining and Knowledge Discovery ; Economics ; Electronic publishing ; Extraction ; Game Theory ; Humanities ; Law ; Machine learning ; Methodology of the Social Sciences ; Networking ; Neural networks ; Original Article ; Popularity ; Semantics ; Social and Behav. Sciences ; Social media ; Social networks ; Spamming ; Statistics for Social Sciences ; Topics ; User behavior</subject><ispartof>Social network analysis and mining, 2022-12, Vol.12 (1), p.144, Article 144</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature 2022. Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-c0279a59780a2bbe618a0a75cf29f5894179e69f44d9adc9fa5c54348c76c383</citedby><cites>FETCH-LOGICAL-c319t-c0279a59780a2bbe618a0a75cf29f5894179e69f44d9adc9fa5c54348c76c383</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2920073714?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,12846,21393,27923,27924,33222,33610,43732</link.rule.ids></links><search><creatorcontrib>Ouni, Sarra</creatorcontrib><creatorcontrib>Fkih, Fethi</creatorcontrib><creatorcontrib>Omri, Mohamed Nazih</creatorcontrib><title>BERT- and CNN-based TOBEAT approach for unwelcome tweets detection</title><title>Social network analysis and mining</title><addtitle>Soc. Netw. Anal. Min</addtitle><description>Social media platforms have become an inevitable part of our lives today. Twitter is one such major online social networking platform. Recently, it pointed to exponential growth with growing interest from registered users. This popularity attracts cybercriminals (or spammers) to spread malware and advertisements via links shared in tweets, and hijack hot topics to get the attention of legitimate users. Instead, these spammers send violent messages known as spam, also known as junk e-mail, and spread other malicious activity. Spam on Twitter has become an inescapable problem that must be solved. In this context, several solutions have been proposed to reveal the problem of Twitter spam. However, the main existing proposed methods suffer from many limitations and cannot perfectly detect spammers on social networks. In this paper, we propose a new approach that considers the extraction of new TOpics-Based fEAtures (TOBEAT), from Twitter data. Our approach is based on BERT (bidirectional encoder representations of transformers) and CNN (convolutional neural network). To implement our solution, a new framework was developed to combine topic-based features with contextual BERT embeddings. The obtained final features vector is then fed into the supervised classifier for classification. The experimental results, performed on a Twitter data collection, show that CNN is the most suitable classifier to solve the spam filtering task. Moreover, the analysis of the results of the comparative study shows that by using the Twitter data set, our approach outperforms the previously published approaches and achieves 94.97%, 94.05%, 95.88%, 94.95% and 94.92% in accuracy , precision , recall , F 1 - s c o r e , and G - m e a n , respectively. In terms of time consumption, our approach recorded a time of 0.5164 seconds per training step. In percentage terms, this represents a gain of 82% compared to the TOBEAT-BERT+SVM model, 76.1% compared to the TOBEAT-BERT+NB model, and 70% compared to the TOBEAT-BERT+RF model.</description><subject>Advertisements</subject><subject>Applications of Graph Theory and Complex Networks</subject><subject>Artificial neural networks</subject><subject>Bidirectionality</subject><subject>Classifiers</subject><subject>Comparative studies</subject><subject>Computer Science</subject><subject>Crime prevention</subject><subject>Cybercrime</subject><subject>Data collection</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Economics</subject><subject>Electronic publishing</subject><subject>Extraction</subject><subject>Game Theory</subject><subject>Humanities</subject><subject>Law</subject><subject>Machine learning</subject><subject>Methodology of the Social Sciences</subject><subject>Networking</subject><subject>Neural networks</subject><subject>Original Article</subject><subject>Popularity</subject><subject>Semantics</subject><subject>Social and Behav. Sciences</subject><subject>Social media</subject><subject>Social networks</subject><subject>Spamming</subject><subject>Statistics for Social Sciences</subject><subject>Topics</subject><subject>User behavior</subject><issn>1869-5450</issn><issn>1869-5469</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>8BJ</sourceid><sourceid>ALSLI</sourceid><sourceid>M2R</sourceid><recordid>eNp9kE1LAzEQhoMoWGr_gKeA5-jka5Mc21I_QFqQvYc0m2hLu1uTLcV_b3RFb55mDu_zzvAgdE3hlgKou0w5U5oAYwTAKCBwhkZUV4ZIUZnz313CJZrkvAUACpwbqEZoNlu81AS7tsHz5ZKsXQ4NrlezxbTG7nBInfNvOHYJH9tT2PluH3B_CqHPuAl98P2ma6_QRXS7HCY_c4zq-0U9fyTPq4en-fSZeE5NTzwwZZw0SoNj63WoqHbglPSRmSi1EVSZUJkoRGNc40100kvBhfaq8lzzMboZastT78eQe7vtjqktFy0zrHjgioqSYkPKpy7nFKI9pM3epQ9LwX7ZsoMtW2zZb1sWCsQHKJdw-xrSX_U_1CcbmGoh</recordid><startdate>20221201</startdate><enddate>20221201</enddate><creator>Ouni, Sarra</creator><creator>Fkih, Fethi</creator><creator>Omri, Mohamed Nazih</creator><general>Springer Vienna</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>0-V</scope><scope>3V.</scope><scope>7XB</scope><scope>88J</scope><scope>8BJ</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FQK</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JBE</scope><scope>JQ2</scope><scope>K7-</scope><scope>M2R</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope></search><sort><creationdate>20221201</creationdate><title>BERT- and CNN-based TOBEAT approach for unwelcome tweets detection</title><author>Ouni, Sarra ; Fkih, Fethi ; Omri, Mohamed Nazih</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-c0279a59780a2bbe618a0a75cf29f5894179e69f44d9adc9fa5c54348c76c383</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Advertisements</topic><topic>Applications of Graph Theory and Complex Networks</topic><topic>Artificial neural networks</topic><topic>Bidirectionality</topic><topic>Classifiers</topic><topic>Comparative studies</topic><topic>Computer Science</topic><topic>Crime prevention</topic><topic>Cybercrime</topic><topic>Data collection</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Economics</topic><topic>Electronic publishing</topic><topic>Extraction</topic><topic>Game Theory</topic><topic>Humanities</topic><topic>Law</topic><topic>Machine learning</topic><topic>Methodology of the Social Sciences</topic><topic>Networking</topic><topic>Neural networks</topic><topic>Original Article</topic><topic>Popularity</topic><topic>Semantics</topic><topic>Social and Behav. Sciences</topic><topic>Social media</topic><topic>Social networks</topic><topic>Spamming</topic><topic>Statistics for Social Sciences</topic><topic>Topics</topic><topic>User behavior</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ouni, Sarra</creatorcontrib><creatorcontrib>Fkih, Fethi</creatorcontrib><creatorcontrib>Omri, Mohamed Nazih</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Social Sciences Premium Collection【Remote access available】</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Social Science Database (Alumni Edition)</collection><collection>International Bibliography of the Social Sciences (IBSS)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central</collection><collection>Social Science Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>Advanced Technologies & Aerospace Database‎ (1962 - current)</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Social Science Database (ProQuest)</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><jtitle>Social network analysis and mining</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ouni, Sarra</au><au>Fkih, Fethi</au><au>Omri, Mohamed Nazih</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>BERT- and CNN-based TOBEAT approach for unwelcome tweets detection</atitle><jtitle>Social network analysis and mining</jtitle><stitle>Soc. Netw. Anal. Min</stitle><date>2022-12-01</date><risdate>2022</risdate><volume>12</volume><issue>1</issue><spage>144</spage><pages>144-</pages><artnum>144</artnum><issn>1869-5450</issn><eissn>1869-5469</eissn><abstract>Social media platforms have become an inevitable part of our lives today. Twitter is one such major online social networking platform. Recently, it pointed to exponential growth with growing interest from registered users. This popularity attracts cybercriminals (or spammers) to spread malware and advertisements via links shared in tweets, and hijack hot topics to get the attention of legitimate users. Instead, these spammers send violent messages known as spam, also known as junk e-mail, and spread other malicious activity. Spam on Twitter has become an inescapable problem that must be solved. In this context, several solutions have been proposed to reveal the problem of Twitter spam. However, the main existing proposed methods suffer from many limitations and cannot perfectly detect spammers on social networks. In this paper, we propose a new approach that considers the extraction of new TOpics-Based fEAtures (TOBEAT), from Twitter data. Our approach is based on BERT (bidirectional encoder representations of transformers) and CNN (convolutional neural network). To implement our solution, a new framework was developed to combine topic-based features with contextual BERT embeddings. The obtained final features vector is then fed into the supervised classifier for classification. The experimental results, performed on a Twitter data collection, show that CNN is the most suitable classifier to solve the spam filtering task. Moreover, the analysis of the results of the comparative study shows that by using the Twitter data set, our approach outperforms the previously published approaches and achieves 94.97%, 94.05%, 95.88%, 94.95% and 94.92% in accuracy , precision , recall , F 1 - s c o r e , and G - m e a n , respectively. In terms of time consumption, our approach recorded a time of 0.5164 seconds per training step. In percentage terms, this represents a gain of 82% compared to the TOBEAT-BERT+SVM model, 76.1% compared to the TOBEAT-BERT+NB model, and 70% compared to the TOBEAT-BERT+RF model.</abstract><cop>Vienna</cop><pub>Springer Vienna</pub><doi>10.1007/s13278-022-00970-0</doi></addata></record>
fulltext	fulltext
identifier	ISSN: 1869-5450
ispartof	Social network analysis and mining, 2022-12, Vol.12 (1), p.144, Article 144
issn	1869-5450 1869-5469
language	eng
recordid	cdi_proquest_journals_2920073714
source	International Bibliography of the Social Sciences (IBSS); Social Science Premium Collection (Proquest) (PQ_SDU_P3); Springer Link
subjects	Advertisements Applications of Graph Theory and Complex Networks Artificial neural networks Bidirectionality Classifiers Comparative studies Computer Science Crime prevention Cybercrime Data collection Data Mining and Knowledge Discovery Economics Electronic publishing Extraction Game Theory Humanities Law Machine learning Methodology of the Social Sciences Networking Neural networks Original Article Popularity Semantics Social and Behav. Sciences Social media Social networks Spamming Statistics for Social Sciences Topics User behavior
title	BERT- and CNN-based TOBEAT approach for unwelcome tweets detection
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T15%3A34%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=BERT-%20and%20CNN-based%20TOBEAT%20approach%20for%20unwelcome%20tweets%20detection&rft.jtitle=Social%20network%20analysis%20and%20mining&rft.au=Ouni,%20Sarra&rft.date=2022-12-01&rft.volume=12&rft.issue=1&rft.spage=144&rft.pages=144-&rft.artnum=144&rft.issn=1869-5450&rft.eissn=1869-5469&rft_id=info:doi/10.1007/s13278-022-00970-0&rft_dat=%3Cproquest_cross%3E2920073714%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c319t-c0279a59780a2bbe618a0a75cf29f5894179e69f44d9adc9fa5c54348c76c383%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2920073714&rft_id=info:pmid/&rfr_iscdi=true