Loading…
Optimizing Phishing Detection: Leveraging URL Features with Machine Learning
The expansion of the internet has markedly sped up the shift from traditional business methods to digital ones. In the modern digital era, criminal activities have notably moved online, taking advantage of the anonymity the internet provides. Cybercriminals use complex tactics like phishing to trick...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The expansion of the internet has markedly sped up the shift from traditional business methods to digital ones. In the modern digital era, criminal activities have notably moved online, taking advantage of the anonymity the internet provides. Cybercriminals use complex tactics like phishing to trick individuals into visiting fake websites, aiming to steal personal information such as usernames, passwords, and account details. Phishing attacks cleverly exploit user vulnerabilities, making it hard to tell apart legitimate websites from fraudulent ones. This study introduces a novel, real-time solution to combat phishing by utilizing advanced classification techniques and in- sights gained from Natural Language Processing (NLP). Moving away from traditional phishing detection methods that rely on visual comparisons of websites, this research instead focuses on the detailed analysis of URL features. It thoroughly investigates aspects like URL length, the inclusion of underscores, and the occurrence of dashes to improve the detection of phishing attempts. This research has explored various models, including Support Vector Machines (SVM), Random Forest, XGBoost, AdaBoost, Logistic Regression (LR), K-Nearest Neighbors (KNN), and Light Gradient Boosting Machine (LGBM), with XGBoost achieving an exceptional 98.7% accuracy, making it the most effective model in identifying sophisticated phishing patterns. Following closely is Random Forest, with a 98.4% accuracy, underscoring the effectiveness of ensemble learning techniques. Ultimately, ensemble approaches, especially XGBoost and Random Forest, prove to be potent in reliably separating phishing sites from legitimate ones, offering significant contributions to the field of cybersecurity. |
---|---|
ISSN: | 2469-5556 |
DOI: | 10.1109/ICACCS60874.2024.10717158 |