Loading…

Kashif: A Chrome Extension for Classifying Arabic Content on Web Pages Using Machine Learning

Search engines are significant tools for finding and retrieving information. Every day, many new web pages in various languages are added. The threats of cyberattacks are expanding rapidly with this massive volume of data. The majority of studies on the detection of malicious websites focus on Engli...

Full description

Saved in:
Bibliographic Details
Published in:Applied sciences 2024-10, Vol.14 (20), p.9222
Main Authors: Aljabri, Malak, Altamimi, Hanan S., Albelali, Shahd A., Al-Harbi, Maimunah, Alhuraib, Haya T., Alotaibi, Najd K., Alahmadi, Amal A., Alhaidari, Fahd, Mohammad, Rami Mustafa A.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c291t-b312901b8d498ba00de75b5d7b63c9ce3c4db0c7d487c6d742527c21a8ad482a3
container_end_page
container_issue 20
container_start_page 9222
container_title Applied sciences
container_volume 14
creator Aljabri, Malak
Altamimi, Hanan S.
Albelali, Shahd A.
Al-Harbi, Maimunah
Alhuraib, Haya T.
Alotaibi, Najd K.
Alahmadi, Amal A.
Alhaidari, Fahd
Mohammad, Rami Mustafa A.
description Search engines are significant tools for finding and retrieving information. Every day, many new web pages in various languages are added. The threats of cyberattacks are expanding rapidly with this massive volume of data. The majority of studies on the detection of malicious websites focus on English-language websites. This necessitates more studies on malicious detection on Arabic-content websites. In this research, we aimed to investigate the security of Arabic-content websites by developing a detection tool that analyzes Arabic content based on artificial intelligence (AI) techniques. We contributed to the field of cybersecurity and AI by building a new dataset of 4048 Arabic-content websites. We created and conducted a comparative performance evaluation for four different machine-learning (ML) models using feature extraction and selection techniques: extreme gradient boosting, support vector machines, decision trees, and random forests. The best-performing model was then integrated into a Chrome plugin, created based on a random forest (RF) model, and utilized the features selected via the chi-square technique. This produced plugin tool attained an accuracy of 92.96% for classifying Arabic-content websites as phishing, suspicious, or benign. To our knowledge, this is the first tool designed specifically for Arabic-content websites.
doi_str_mv 10.3390/app14209222
format article
fullrecord <record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_126ea7bb37304a9a9d1189bfe718436d</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A814380387</galeid><doaj_id>oai_doaj_org_article_126ea7bb37304a9a9d1189bfe718436d</doaj_id><sourcerecordid>A814380387</sourcerecordid><originalsourceid>FETCH-LOGICAL-c291t-b312901b8d498ba00de75b5d7b63c9ce3c4db0c7d487c6d742527c21a8ad482a3</originalsourceid><addsrcrecordid>eNpNkUtLQzEQhS-ioKgr_0DApVTz6k3irlx8YUUXiisJk8dtU9qkJreg_95oRZxZzHD45nBgmuaE4HPGFL6A9ZpwihWldKc5oFi0I8aJ2P237zfHpSxwLUWYJPigebuHMg_9JZqgbp7TyqOrj8HHElJEfcqoW0Ipof8McYYmGUywqEuxEgOqxKs36AlmvqCX8k08gJ2H6NHUQ45VOGr2elgWf_w7D5uX66vn7nY0fby56ybTkaWKDCPDCFWYGOm4kgYwdl6MzdgJ0zKrrGeWO4OtcFwK2zrB6ZgKSwlIqBIFdtjcbX1dgoVe57CC_KkTBP0jpDzTkIdgl14T2noQxjDBMAcFyhEilem9IJKz1lWv063XOqf3jS-DXqRNjjW-rjHxmCmu2kqdb6kZVNMQ-zRksLWdXwWbou9D1SeScCYxk6IenG0PbE6lZN__xSRYf_9P__sf-wIEcosC</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3120539496</pqid></control><display><type>article</type><title>Kashif: A Chrome Extension for Classifying Arabic Content on Web Pages Using Machine Learning</title><source>Publicly Available Content (ProQuest)</source><creator>Aljabri, Malak ; Altamimi, Hanan S. ; Albelali, Shahd A. ; Al-Harbi, Maimunah ; Alhuraib, Haya T. ; Alotaibi, Najd K. ; Alahmadi, Amal A. ; Alhaidari, Fahd ; Mohammad, Rami Mustafa A.</creator><creatorcontrib>Aljabri, Malak ; Altamimi, Hanan S. ; Albelali, Shahd A. ; Al-Harbi, Maimunah ; Alhuraib, Haya T. ; Alotaibi, Najd K. ; Alahmadi, Amal A. ; Alhaidari, Fahd ; Mohammad, Rami Mustafa A.</creatorcontrib><description>Search engines are significant tools for finding and retrieving information. Every day, many new web pages in various languages are added. The threats of cyberattacks are expanding rapidly with this massive volume of data. The majority of studies on the detection of malicious websites focus on English-language websites. This necessitates more studies on malicious detection on Arabic-content websites. In this research, we aimed to investigate the security of Arabic-content websites by developing a detection tool that analyzes Arabic content based on artificial intelligence (AI) techniques. We contributed to the field of cybersecurity and AI by building a new dataset of 4048 Arabic-content websites. We created and conducted a comparative performance evaluation for four different machine-learning (ML) models using feature extraction and selection techniques: extreme gradient boosting, support vector machines, decision trees, and random forests. The best-performing model was then integrated into a Chrome plugin, created based on a random forest (RF) model, and utilized the features selected via the chi-square technique. This produced plugin tool attained an accuracy of 92.96% for classifying Arabic-content websites as phishing, suspicious, or benign. To our knowledge, this is the first tool designed specifically for Arabic-content websites.</description><identifier>ISSN: 2076-3417</identifier><identifier>EISSN: 2076-3417</identifier><identifier>DOI: 10.3390/app14209222</identifier><language>eng</language><publisher>Basel: MDPI AG</publisher><subject>Accuracy ; Algorithms ; Analysis ; Artificial intelligence ; benign ; Cybercrime ; Cybersecurity ; Cyberterrorism ; Datasets ; Deep learning ; Helium ; Identity theft ; International economic relations ; Internet ; Keywords ; Machine learning ; malicious ; Malware ; Neural networks ; Petroleum industry ; Phishing ; random forest ; Research methodology ; Support vector machines ; Trends ; URLs ; Web browsers ; Web sites</subject><ispartof>Applied sciences, 2024-10, Vol.14 (20), p.9222</ispartof><rights>COPYRIGHT 2024 MDPI AG</rights><rights>2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c291t-b312901b8d498ba00de75b5d7b63c9ce3c4db0c7d487c6d742527c21a8ad482a3</cites><orcidid>0000-0002-6371-1614 ; 0000-0003-4383-0269 ; 0000-0002-2612-1615 ; 0000-0001-7452-5473</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/3120539496/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/3120539496?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,25753,27924,27925,37012,44590,75126</link.rule.ids></links><search><creatorcontrib>Aljabri, Malak</creatorcontrib><creatorcontrib>Altamimi, Hanan S.</creatorcontrib><creatorcontrib>Albelali, Shahd A.</creatorcontrib><creatorcontrib>Al-Harbi, Maimunah</creatorcontrib><creatorcontrib>Alhuraib, Haya T.</creatorcontrib><creatorcontrib>Alotaibi, Najd K.</creatorcontrib><creatorcontrib>Alahmadi, Amal A.</creatorcontrib><creatorcontrib>Alhaidari, Fahd</creatorcontrib><creatorcontrib>Mohammad, Rami Mustafa A.</creatorcontrib><title>Kashif: A Chrome Extension for Classifying Arabic Content on Web Pages Using Machine Learning</title><title>Applied sciences</title><description>Search engines are significant tools for finding and retrieving information. Every day, many new web pages in various languages are added. The threats of cyberattacks are expanding rapidly with this massive volume of data. The majority of studies on the detection of malicious websites focus on English-language websites. This necessitates more studies on malicious detection on Arabic-content websites. In this research, we aimed to investigate the security of Arabic-content websites by developing a detection tool that analyzes Arabic content based on artificial intelligence (AI) techniques. We contributed to the field of cybersecurity and AI by building a new dataset of 4048 Arabic-content websites. We created and conducted a comparative performance evaluation for four different machine-learning (ML) models using feature extraction and selection techniques: extreme gradient boosting, support vector machines, decision trees, and random forests. The best-performing model was then integrated into a Chrome plugin, created based on a random forest (RF) model, and utilized the features selected via the chi-square technique. This produced plugin tool attained an accuracy of 92.96% for classifying Arabic-content websites as phishing, suspicious, or benign. To our knowledge, this is the first tool designed specifically for Arabic-content websites.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Analysis</subject><subject>Artificial intelligence</subject><subject>benign</subject><subject>Cybercrime</subject><subject>Cybersecurity</subject><subject>Cyberterrorism</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>Helium</subject><subject>Identity theft</subject><subject>International economic relations</subject><subject>Internet</subject><subject>Keywords</subject><subject>Machine learning</subject><subject>malicious</subject><subject>Malware</subject><subject>Neural networks</subject><subject>Petroleum industry</subject><subject>Phishing</subject><subject>random forest</subject><subject>Research methodology</subject><subject>Support vector machines</subject><subject>Trends</subject><subject>URLs</subject><subject>Web browsers</subject><subject>Web sites</subject><issn>2076-3417</issn><issn>2076-3417</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNpNkUtLQzEQhS-ioKgr_0DApVTz6k3irlx8YUUXiisJk8dtU9qkJreg_95oRZxZzHD45nBgmuaE4HPGFL6A9ZpwihWldKc5oFi0I8aJ2P237zfHpSxwLUWYJPigebuHMg_9JZqgbp7TyqOrj8HHElJEfcqoW0Ipof8McYYmGUywqEuxEgOqxKs36AlmvqCX8k08gJ2H6NHUQ45VOGr2elgWf_w7D5uX66vn7nY0fby56ybTkaWKDCPDCFWYGOm4kgYwdl6MzdgJ0zKrrGeWO4OtcFwK2zrB6ZgKSwlIqBIFdtjcbX1dgoVe57CC_KkTBP0jpDzTkIdgl14T2noQxjDBMAcFyhEilem9IJKz1lWv063XOqf3jS-DXqRNjjW-rjHxmCmu2kqdb6kZVNMQ-zRksLWdXwWbou9D1SeScCYxk6IenG0PbE6lZN__xSRYf_9P__sf-wIEcosC</recordid><startdate>20241001</startdate><enddate>20241001</enddate><creator>Aljabri, Malak</creator><creator>Altamimi, Hanan S.</creator><creator>Albelali, Shahd A.</creator><creator>Al-Harbi, Maimunah</creator><creator>Alhuraib, Haya T.</creator><creator>Alotaibi, Najd K.</creator><creator>Alahmadi, Amal A.</creator><creator>Alhaidari, Fahd</creator><creator>Mohammad, Rami Mustafa A.</creator><general>MDPI AG</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-6371-1614</orcidid><orcidid>https://orcid.org/0000-0003-4383-0269</orcidid><orcidid>https://orcid.org/0000-0002-2612-1615</orcidid><orcidid>https://orcid.org/0000-0001-7452-5473</orcidid></search><sort><creationdate>20241001</creationdate><title>Kashif: A Chrome Extension for Classifying Arabic Content on Web Pages Using Machine Learning</title><author>Aljabri, Malak ; Altamimi, Hanan S. ; Albelali, Shahd A. ; Al-Harbi, Maimunah ; Alhuraib, Haya T. ; Alotaibi, Najd K. ; Alahmadi, Amal A. ; Alhaidari, Fahd ; Mohammad, Rami Mustafa A.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c291t-b312901b8d498ba00de75b5d7b63c9ce3c4db0c7d487c6d742527c21a8ad482a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Analysis</topic><topic>Artificial intelligence</topic><topic>benign</topic><topic>Cybercrime</topic><topic>Cybersecurity</topic><topic>Cyberterrorism</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>Helium</topic><topic>Identity theft</topic><topic>International economic relations</topic><topic>Internet</topic><topic>Keywords</topic><topic>Machine learning</topic><topic>malicious</topic><topic>Malware</topic><topic>Neural networks</topic><topic>Petroleum industry</topic><topic>Phishing</topic><topic>random forest</topic><topic>Research methodology</topic><topic>Support vector machines</topic><topic>Trends</topic><topic>URLs</topic><topic>Web browsers</topic><topic>Web sites</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Aljabri, Malak</creatorcontrib><creatorcontrib>Altamimi, Hanan S.</creatorcontrib><creatorcontrib>Albelali, Shahd A.</creatorcontrib><creatorcontrib>Al-Harbi, Maimunah</creatorcontrib><creatorcontrib>Alhuraib, Haya T.</creatorcontrib><creatorcontrib>Alotaibi, Najd K.</creatorcontrib><creatorcontrib>Alahmadi, Amal A.</creatorcontrib><creatorcontrib>Alhaidari, Fahd</creatorcontrib><creatorcontrib>Mohammad, Rami Mustafa A.</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>Applied sciences</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Aljabri, Malak</au><au>Altamimi, Hanan S.</au><au>Albelali, Shahd A.</au><au>Al-Harbi, Maimunah</au><au>Alhuraib, Haya T.</au><au>Alotaibi, Najd K.</au><au>Alahmadi, Amal A.</au><au>Alhaidari, Fahd</au><au>Mohammad, Rami Mustafa A.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Kashif: A Chrome Extension for Classifying Arabic Content on Web Pages Using Machine Learning</atitle><jtitle>Applied sciences</jtitle><date>2024-10-01</date><risdate>2024</risdate><volume>14</volume><issue>20</issue><spage>9222</spage><pages>9222-</pages><issn>2076-3417</issn><eissn>2076-3417</eissn><abstract>Search engines are significant tools for finding and retrieving information. Every day, many new web pages in various languages are added. The threats of cyberattacks are expanding rapidly with this massive volume of data. The majority of studies on the detection of malicious websites focus on English-language websites. This necessitates more studies on malicious detection on Arabic-content websites. In this research, we aimed to investigate the security of Arabic-content websites by developing a detection tool that analyzes Arabic content based on artificial intelligence (AI) techniques. We contributed to the field of cybersecurity and AI by building a new dataset of 4048 Arabic-content websites. We created and conducted a comparative performance evaluation for four different machine-learning (ML) models using feature extraction and selection techniques: extreme gradient boosting, support vector machines, decision trees, and random forests. The best-performing model was then integrated into a Chrome plugin, created based on a random forest (RF) model, and utilized the features selected via the chi-square technique. This produced plugin tool attained an accuracy of 92.96% for classifying Arabic-content websites as phishing, suspicious, or benign. To our knowledge, this is the first tool designed specifically for Arabic-content websites.</abstract><cop>Basel</cop><pub>MDPI AG</pub><doi>10.3390/app14209222</doi><orcidid>https://orcid.org/0000-0002-6371-1614</orcidid><orcidid>https://orcid.org/0000-0003-4383-0269</orcidid><orcidid>https://orcid.org/0000-0002-2612-1615</orcidid><orcidid>https://orcid.org/0000-0001-7452-5473</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2076-3417
ispartof Applied sciences, 2024-10, Vol.14 (20), p.9222
issn 2076-3417
2076-3417
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_126ea7bb37304a9a9d1189bfe718436d
source Publicly Available Content (ProQuest)
subjects Accuracy
Algorithms
Analysis
Artificial intelligence
benign
Cybercrime
Cybersecurity
Cyberterrorism
Datasets
Deep learning
Helium
Identity theft
International economic relations
Internet
Keywords
Machine learning
malicious
Malware
Neural networks
Petroleum industry
Phishing
random forest
Research methodology
Support vector machines
Trends
URLs
Web browsers
Web sites
title Kashif: A Chrome Extension for Classifying Arabic Content on Web Pages Using Machine Learning
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T04%3A01%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Kashif:%20A%20Chrome%20Extension%20for%20Classifying%20Arabic%20Content%20on%20Web%20Pages%20Using%20Machine%20Learning&rft.jtitle=Applied%20sciences&rft.au=Aljabri,%20Malak&rft.date=2024-10-01&rft.volume=14&rft.issue=20&rft.spage=9222&rft.pages=9222-&rft.issn=2076-3417&rft.eissn=2076-3417&rft_id=info:doi/10.3390/app14209222&rft_dat=%3Cgale_doaj_%3EA814380387%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c291t-b312901b8d498ba00de75b5d7b63c9ce3c4db0c7d487c6d742527c21a8ad482a3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3120539496&rft_id=info:pmid/&rft_galeid=A814380387&rfr_iscdi=true