Loading…

Kashif: A Chrome Extension for Classifying Arabic Content on Web Pages Using Machine Learning

Search engines are significant tools for finding and retrieving information. Every day, many new web pages in various languages are added. The threats of cyberattacks are expanding rapidly with this massive volume of data. The majority of studies on the detection of malicious websites focus on Engli...

Full description

Saved in:
Bibliographic Details
Published in:Applied sciences 2024-10, Vol.14 (20), p.9222
Main Authors: Aljabri, Malak, Altamimi, Hanan S., Albelali, Shahd A., Al-Harbi, Maimunah, Alhuraib, Haya T., Alotaibi, Najd K., Alahmadi, Amal A., Alhaidari, Fahd, Mohammad, Rami Mustafa A.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Search engines are significant tools for finding and retrieving information. Every day, many new web pages in various languages are added. The threats of cyberattacks are expanding rapidly with this massive volume of data. The majority of studies on the detection of malicious websites focus on English-language websites. This necessitates more studies on malicious detection on Arabic-content websites. In this research, we aimed to investigate the security of Arabic-content websites by developing a detection tool that analyzes Arabic content based on artificial intelligence (AI) techniques. We contributed to the field of cybersecurity and AI by building a new dataset of 4048 Arabic-content websites. We created and conducted a comparative performance evaluation for four different machine-learning (ML) models using feature extraction and selection techniques: extreme gradient boosting, support vector machines, decision trees, and random forests. The best-performing model was then integrated into a Chrome plugin, created based on a random forest (RF) model, and utilized the features selected via the chi-square technique. This produced plugin tool attained an accuracy of 92.96% for classifying Arabic-content websites as phishing, suspicious, or benign. To our knowledge, this is the first tool designed specifically for Arabic-content websites.
ISSN:2076-3417
2076-3417
DOI:10.3390/app14209222