Loading…

Machine learning algorithm-based spam detection in social networks

Many social media (SM) platforms have emerged as a result of the online social network’s (OSN) rapid expansion. SM has become important in day-to-day life, and spammers have turned their attention to SM. Spam detection (SD) is done in two different ways, such as machine learning (ML) and expert-base...

Full description

Saved in:

Bibliographic Details
Published in:	Social network analysis and mining 2023-08, Vol.13 (1), p.104
Main Authors:	Sumathi, M, Raja, S. P
Format:	Article
Language:	English
Subjects:	Accuracy Algorithms Artificial intelligence Business metrics Classification Classifiers Cybercrime Datasets Decision trees Deep learning Dictionaries Electronic mail systems Flexibility Literature reviews Machine learning Neural networks Performance evaluation Recall Social media Social networks Statistical analysis Support vector machines
Citations:	Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c363t-8c9d89e4257eae797f536b28b0926d70a67d87cac8a83ceeb42e8e9b80642a9d3
cites
container_end_page
container_issue	1
container_start_page	104
container_title	Social network analysis and mining
container_volume	13
creator	Sumathi, M Raja, S. P
description	Many social media (SM) platforms have emerged as a result of the online social network’s (OSN) rapid expansion. SM has become important in day-to-day life, and spammers have turned their attention to SM. Spam detection (SD) is done in two different ways, such as machine learning (ML) and expert-based detection. The expert-based detection technique’s accuracy depends on expert knowledge, and it takes huge time to detect the spams. Thus, ML-based spam detection is preferred in OSN. Spam identification on social networks is a difficult operation involving a variety of factors, and spam and ham have resulted in an imbalanced data distribution, which gives flexibility to spammers for corrupting our devices. SD based on ML algorithms like logistic regression (LR), K-nearest neighbor (KNN), decision trees (DT), random forest (RF), support vector machine (SVM) and eXtreme gradient boosting (XGB), voting classifier (VC) and extra tree classifier (ETC) are used to design the address balance and to attain high assessment accuracy in an imbalanced datasets. ETC method minimizes the bias through the original sampling process. For reducing processing complexity, the ETC method uses a smaller size constant factor instead of a larger one. Thus, the ETC technique produces better data splitting than DT and RF techniques. Text is vectorized by vectorizers, and all the relative results are stored in it. The VC is an ensemble method that integrates predictions form several methods to forecast an output class depending on which predictions have the highest probability. The multi-class results are aggregated and forecast for the majority voted class. The experimental result shows that, as compared to KN, NB, ETC, RF, SVC, LR, XGB and DT, the proposed VC provides a higher classification accuracy rate of 97.96%, 97.56% of precision, 89.95% of recall and 91.96% of F1-measures. Similarly, ETC provides 97.77% accuracy, 98.31% of precision, 84.78% of recall and 91.05% of F1-measures. Compared to conventional ML algorithms, VC and ETC provide higher accuracy, precision, recall and F1-measures. Thus, ETC and VC are preferable for spam detection. The website has been designed to detect messages as spam or not.
doi_str_mv	10.1007/s13278-023-01108-6
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2919613146</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2919613146</sourcerecordid><originalsourceid>FETCH-LOGICAL-c363t-8c9d89e4257eae797f536b28b0926d70a67d87cac8a83ceeb42e8e9b80642a9d3</originalsourceid><addsrcrecordid>eNo9jctKAzEUQIMoWGp_wFXAdfQmmcljqcUXVNy063InuW1Tp0mdTPH3FRRX56zOYexawq0EsHdVamWdAKUFSAlOmDM2kc540TbGn_97C5dsVuseACRo7cFM2MMbhl3KxHvCIae85dhvy5DG3UF0WCnyesQDjzRSGFPJPGVeS0jY80zjVxk-6hW72GBfafbHKVs9PS7nL2Lx_vw6v1-IoI0ehQs-Ok-Nai0hWW83rTadch14ZaIFNDY6GzA4dDoQdY0iR75zYBqFPuopu_ntHofyeaI6rvflNOSf5Vp56Y3UsjH6Gw07TaY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2919613146</pqid></control><display><type>article</type><title>Machine learning algorithm-based spam detection in social networks</title><source>International Bibliography of the Social Sciences (IBSS)</source><source>Social Science Premium Collection (Proquest) (PQ_SDU_P3)</source><source>Springer Link</source><creator>Sumathi, M ; Raja, S. P</creator><creatorcontrib>Sumathi, M ; Raja, S. P</creatorcontrib><description>Many social media (SM) platforms have emerged as a result of the online social network’s (OSN) rapid expansion. SM has become important in day-to-day life, and spammers have turned their attention to SM. Spam detection (SD) is done in two different ways, such as machine learning (ML) and expert-based detection. The expert-based detection technique’s accuracy depends on expert knowledge, and it takes huge time to detect the spams. Thus, ML-based spam detection is preferred in OSN. Spam identification on social networks is a difficult operation involving a variety of factors, and spam and ham have resulted in an imbalanced data distribution, which gives flexibility to spammers for corrupting our devices. SD based on ML algorithms like logistic regression (LR), K-nearest neighbor (KNN), decision trees (DT), random forest (RF), support vector machine (SVM) and eXtreme gradient boosting (XGB), voting classifier (VC) and extra tree classifier (ETC) are used to design the address balance and to attain high assessment accuracy in an imbalanced datasets. ETC method minimizes the bias through the original sampling process. For reducing processing complexity, the ETC method uses a smaller size constant factor instead of a larger one. Thus, the ETC technique produces better data splitting than DT and RF techniques. Text is vectorized by vectorizers, and all the relative results are stored in it. The VC is an ensemble method that integrates predictions form several methods to forecast an output class depending on which predictions have the highest probability. The multi-class results are aggregated and forecast for the majority voted class. The experimental result shows that, as compared to KN, NB, ETC, RF, SVC, LR, XGB and DT, the proposed VC provides a higher classification accuracy rate of 97.96%, 97.56% of precision, 89.95% of recall and 91.96% of F1-measures. Similarly, ETC provides 97.77% accuracy, 98.31% of precision, 84.78% of recall and 91.05% of F1-measures. Compared to conventional ML algorithms, VC and ETC provide higher accuracy, precision, recall and F1-measures. Thus, ETC and VC are preferable for spam detection. The website has been designed to detect messages as spam or not.</description><identifier>ISSN: 1869-5450</identifier><identifier>EISSN: 1869-5469</identifier><identifier>DOI: 10.1007/s13278-023-01108-6</identifier><language>eng</language><publisher>Heidelberg: Springer Nature B.V</publisher><subject>Accuracy ; Algorithms ; Artificial intelligence ; Business metrics ; Classification ; Classifiers ; Cybercrime ; Datasets ; Decision trees ; Deep learning ; Dictionaries ; Electronic mail systems ; Flexibility ; Literature reviews ; Machine learning ; Neural networks ; Performance evaluation ; Recall ; Social media ; Social networks ; Statistical analysis ; Support vector machines</subject><ispartof>Social network analysis and mining, 2023-08, Vol.13 (1), p.104</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c363t-8c9d89e4257eae797f536b28b0926d70a67d87cac8a83ceeb42e8e9b80642a9d3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2919613146?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,12847,21394,27924,27925,33223,33611,43733</link.rule.ids></links><search><creatorcontrib>Sumathi, M</creatorcontrib><creatorcontrib>Raja, S. P</creatorcontrib><title>Machine learning algorithm-based spam detection in social networks</title><title>Social network analysis and mining</title><description>Many social media (SM) platforms have emerged as a result of the online social network’s (OSN) rapid expansion. SM has become important in day-to-day life, and spammers have turned their attention to SM. Spam detection (SD) is done in two different ways, such as machine learning (ML) and expert-based detection. The expert-based detection technique’s accuracy depends on expert knowledge, and it takes huge time to detect the spams. Thus, ML-based spam detection is preferred in OSN. Spam identification on social networks is a difficult operation involving a variety of factors, and spam and ham have resulted in an imbalanced data distribution, which gives flexibility to spammers for corrupting our devices. SD based on ML algorithms like logistic regression (LR), K-nearest neighbor (KNN), decision trees (DT), random forest (RF), support vector machine (SVM) and eXtreme gradient boosting (XGB), voting classifier (VC) and extra tree classifier (ETC) are used to design the address balance and to attain high assessment accuracy in an imbalanced datasets. ETC method minimizes the bias through the original sampling process. For reducing processing complexity, the ETC method uses a smaller size constant factor instead of a larger one. Thus, the ETC technique produces better data splitting than DT and RF techniques. Text is vectorized by vectorizers, and all the relative results are stored in it. The VC is an ensemble method that integrates predictions form several methods to forecast an output class depending on which predictions have the highest probability. The multi-class results are aggregated and forecast for the majority voted class. The experimental result shows that, as compared to KN, NB, ETC, RF, SVC, LR, XGB and DT, the proposed VC provides a higher classification accuracy rate of 97.96%, 97.56% of precision, 89.95% of recall and 91.96% of F1-measures. Similarly, ETC provides 97.77% accuracy, 98.31% of precision, 84.78% of recall and 91.05% of F1-measures. Compared to conventional ML algorithms, VC and ETC provide higher accuracy, precision, recall and F1-measures. Thus, ETC and VC are preferable for spam detection. The website has been designed to detect messages as spam or not.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Artificial intelligence</subject><subject>Business metrics</subject><subject>Classification</subject><subject>Classifiers</subject><subject>Cybercrime</subject><subject>Datasets</subject><subject>Decision trees</subject><subject>Deep learning</subject><subject>Dictionaries</subject><subject>Electronic mail systems</subject><subject>Flexibility</subject><subject>Literature reviews</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>Performance evaluation</subject><subject>Recall</subject><subject>Social media</subject><subject>Social networks</subject><subject>Statistical analysis</subject><subject>Support vector machines</subject><issn>1869-5450</issn><issn>1869-5469</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>8BJ</sourceid><sourceid>ALSLI</sourceid><sourceid>M2R</sourceid><recordid>eNo9jctKAzEUQIMoWGp_wFXAdfQmmcljqcUXVNy063InuW1Tp0mdTPH3FRRX56zOYexawq0EsHdVamWdAKUFSAlOmDM2kc540TbGn_97C5dsVuseACRo7cFM2MMbhl3KxHvCIae85dhvy5DG3UF0WCnyesQDjzRSGFPJPGVeS0jY80zjVxk-6hW72GBfafbHKVs9PS7nL2Lx_vw6v1-IoI0ehQs-Ok-Nai0hWW83rTadch14ZaIFNDY6GzA4dDoQdY0iR75zYBqFPuopu_ntHofyeaI6rvflNOSf5Vp56Y3UsjH6Gw07TaY</recordid><startdate>20230819</startdate><enddate>20230819</enddate><creator>Sumathi, M</creator><creator>Raja, S. P</creator><general>Springer Nature B.V</general><scope>0-V</scope><scope>3V.</scope><scope>7XB</scope><scope>88J</scope><scope>8BJ</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ALSLI</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FQK</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JBE</scope><scope>JQ2</scope><scope>K7-</scope><scope>M2R</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope></search><sort><creationdate>20230819</creationdate><title>Machine learning algorithm-based spam detection in social networks</title><author>Sumathi, M ; Raja, S. P</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c363t-8c9d89e4257eae797f536b28b0926d70a67d87cac8a83ceeb42e8e9b80642a9d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Artificial intelligence</topic><topic>Business metrics</topic><topic>Classification</topic><topic>Classifiers</topic><topic>Cybercrime</topic><topic>Datasets</topic><topic>Decision trees</topic><topic>Deep learning</topic><topic>Dictionaries</topic><topic>Electronic mail systems</topic><topic>Flexibility</topic><topic>Literature reviews</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>Performance evaluation</topic><topic>Recall</topic><topic>Social media</topic><topic>Social networks</topic><topic>Statistical analysis</topic><topic>Support vector machines</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sumathi, M</creatorcontrib><creatorcontrib>Raja, S. P</creatorcontrib><collection>ProQuest Social Sciences Premium Collection【Remote access available】</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Social Science Database (Alumni Edition)</collection><collection>International Bibliography of the Social Sciences (IBSS)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Social Science Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer science database</collection><collection>Social Science Database</collection><collection>ProQuest advanced technologies & aerospace journals</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><jtitle>Social network analysis and mining</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sumathi, M</au><au>Raja, S. P</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Machine learning algorithm-based spam detection in social networks</atitle><jtitle>Social network analysis and mining</jtitle><date>2023-08-19</date><risdate>2023</risdate><volume>13</volume><issue>1</issue><spage>104</spage><pages>104-</pages><issn>1869-5450</issn><eissn>1869-5469</eissn><abstract>Many social media (SM) platforms have emerged as a result of the online social network’s (OSN) rapid expansion. SM has become important in day-to-day life, and spammers have turned their attention to SM. Spam detection (SD) is done in two different ways, such as machine learning (ML) and expert-based detection. The expert-based detection technique’s accuracy depends on expert knowledge, and it takes huge time to detect the spams. Thus, ML-based spam detection is preferred in OSN. Spam identification on social networks is a difficult operation involving a variety of factors, and spam and ham have resulted in an imbalanced data distribution, which gives flexibility to spammers for corrupting our devices. SD based on ML algorithms like logistic regression (LR), K-nearest neighbor (KNN), decision trees (DT), random forest (RF), support vector machine (SVM) and eXtreme gradient boosting (XGB), voting classifier (VC) and extra tree classifier (ETC) are used to design the address balance and to attain high assessment accuracy in an imbalanced datasets. ETC method minimizes the bias through the original sampling process. For reducing processing complexity, the ETC method uses a smaller size constant factor instead of a larger one. Thus, the ETC technique produces better data splitting than DT and RF techniques. Text is vectorized by vectorizers, and all the relative results are stored in it. The VC is an ensemble method that integrates predictions form several methods to forecast an output class depending on which predictions have the highest probability. The multi-class results are aggregated and forecast for the majority voted class. The experimental result shows that, as compared to KN, NB, ETC, RF, SVC, LR, XGB and DT, the proposed VC provides a higher classification accuracy rate of 97.96%, 97.56% of precision, 89.95% of recall and 91.96% of F1-measures. Similarly, ETC provides 97.77% accuracy, 98.31% of precision, 84.78% of recall and 91.05% of F1-measures. Compared to conventional ML algorithms, VC and ETC provide higher accuracy, precision, recall and F1-measures. Thus, ETC and VC are preferable for spam detection. The website has been designed to detect messages as spam or not.</abstract><cop>Heidelberg</cop><pub>Springer Nature B.V</pub><doi>10.1007/s13278-023-01108-6</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1869-5450
ispartof	Social network analysis and mining, 2023-08, Vol.13 (1), p.104
issn	1869-5450 1869-5469
language	eng
recordid	cdi_proquest_journals_2919613146
source	International Bibliography of the Social Sciences (IBSS); Social Science Premium Collection (Proquest) (PQ_SDU_P3); Springer Link
subjects	Accuracy Algorithms Artificial intelligence Business metrics Classification Classifiers Cybercrime Datasets Decision trees Deep learning Dictionaries Electronic mail systems Flexibility Literature reviews Machine learning Neural networks Performance evaluation Recall Social media Social networks Statistical analysis Support vector machines
title	Machine learning algorithm-based spam detection in social networks
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T09%3A39%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Machine%20learning%20algorithm-based%20spam%20detection%20in%20social%20networks&rft.jtitle=Social%20network%20analysis%20and%20mining&rft.au=Sumathi,%20M&rft.date=2023-08-19&rft.volume=13&rft.issue=1&rft.spage=104&rft.pages=104-&rft.issn=1869-5450&rft.eissn=1869-5469&rft_id=info:doi/10.1007/s13278-023-01108-6&rft_dat=%3Cproquest%3E2919613146%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c363t-8c9d89e4257eae797f536b28b0926d70a67d87cac8a83ceeb42e8e9b80642a9d3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2919613146&rft_id=info:pmid/&rfr_iscdi=true