Loading…

An Effective Hybrid Outlier Selection Method for Breast Cancer Classification Using Machine Learning Algorithms

Objectives: To develop a model for the prediction of Breast cancer. Cancer is one of the deadliest diseases and it is regarded as the second leading cause of death in women throughout the sphere. Former detection of cancer can save the patient's life. Outliers can have an impact on the model�...

Full description

Saved in:
Bibliographic Details
Published in:Indian journal of science and technology 2024-11, Vol.17 (43), p.4494-4501
Main Authors: Tintu, P B, Veni, S, Priya, S Manju
Format: Article
Language:English
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Objectives: To develop a model for the prediction of Breast cancer. Cancer is one of the deadliest diseases and it is regarded as the second leading cause of death in women throughout the sphere. Former detection of cancer can save the patient's life. Outliers can have an impact on the model's performance. For this reason, eliminating outliers is the first factor to be considered. Methods: In this study, the Wisconsin Diagnostic Breast Cancer dataset was used. It consists of 569 instances of which 357 instances are benign and 212 are malignant cases. It has 32 attributes including two class attribute labels (diagnosis: B= benign, M= malignant), ID number, and 30 real value attributes. These attributes are computed from a digitized image of a Fine Needle Aspiration (FNA) procedure of a breast mass and are used to describe the characteristics of the cell nuclei present in the image. The HOTSM outlier detection approach, which handles anomalies in two stages, was proposed in the current study. First, the Inter Quartile Range (IQR) was employed to diminish the influence of outliers. After the analysis had been finished, the non-outlier data was transmitted to an isolation forest, wherein the absolute mean error was calculated. Pearson's Correlation was employed to minimize the dimensionality. Findings: For the performance evaluation, two datasets are generated; one using isolation forest and the other using HOTSM. The performance of both datasets is tested using SVM, Decision Tree, and Random Forest classifiers, highest accuracies are obtained as 97.80 %,96.80%, and 98.4% respectively. It was found that the dataset generated using the proposed method performed well. The proposed model is capable of identifying Breast cancer, more accurately. Novelty: The Interquartile Range has been utilized for altering the traditional isolation forest algorithm, enhancing performance metrics. The thorough removal of anomalies reduces the likelihood of misdiagnosis, yet they cannot exclude all outliers. Keywords: Outliers, Breast Cancer, Accuracy, Machine Learning, Hybrid
ISSN:0974-6846
0974-5645
DOI:10.17485/IJST/v17i43.2109