Loading…

Optimizing Phishing Detection: Leveraging URL Features with Machine Learning

The expansion of the internet has markedly sped up the shift from traditional business methods to digital ones. In the modern digital era, criminal activities have notably moved online, taking advantage of the anonymity the internet provides. Cybercriminals use complex tactics like phishing to trick...

Full description

Saved in:

Bibliographic Details
Main Authors:	Rani, R. Suneetha, Silpa, N., Satish, G. Naga, Amrutha, N., Reddy, G. Nimisha
Format:	Conference Proceeding
Language:	English
Subjects:	Accuracy Classification Techniques Cybersecurity Ensemble Learning Feature extraction Logistic regression Machine learning algorithms Natural language processing Natural Language Processing (NLP) Nearest neighbor methods Phishing Phishing Detection Random forests Support vector machines Uniform resource locators
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The expansion of the internet has markedly sped up the shift from traditional business methods to digital ones. In the modern digital era, criminal activities have notably moved online, taking advantage of the anonymity the internet provides. Cybercriminals use complex tactics like phishing to trick individuals into visiting fake websites, aiming to steal personal information such as usernames, passwords, and account details. Phishing attacks cleverly exploit user vulnerabilities, making it hard to tell apart legitimate websites from fraudulent ones. This study introduces a novel, real-time solution to combat phishing by utilizing advanced classification techniques and in- sights gained from Natural Language Processing (NLP). Moving away from traditional phishing detection methods that rely on visual comparisons of websites, this research instead focuses on the detailed analysis of URL features. It thoroughly investigates aspects like URL length, the inclusion of underscores, and the occurrence of dashes to improve the detection of phishing attempts. This research has explored various models, including Support Vector Machines (SVM), Random Forest, XGBoost, AdaBoost, Logistic Regression (LR), K-Nearest Neighbors (KNN), and Light Gradient Boosting Machine (LGBM), with XGBoost achieving an exceptional 98.7% accuracy, making it the most effective model in identifying sophisticated phishing patterns. Following closely is Random Forest, with a 98.4% accuracy, underscoring the effectiveness of ensemble learning techniques. Ultimately, ensemble approaches, especially XGBoost and Random Forest, prove to be potent in reliably separating phishing sites from legitimate ones, offering significant contributions to the field of cybersecurity.
ISSN:	2469-5556
DOI:	10.1109/ICACCS60874.2024.10717158