Loading…

A comparative study of machine learning models with LASSO and SHAP feature selection for breast cancer prediction

In recent decades, breast cancer has become the most prevalent type of cancer that impacts women in the world, which shows a significant risk to the death rates of women. Early identification of breast cancer might drastically decrease patient mortality and greatly improve the chance of an effective...

Full description

Saved in:
Bibliographic Details
Published in:Healthcare analytics (New York, N.Y.) N.Y.), 2024-12, Vol.6, p.100353, Article 100353
Main Authors: Shaon, Md. Shazzad Hossain, Karim, Tasmin, Shakil, Md. Shahriar, Hasan, Md. Zahid
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In recent decades, breast cancer has become the most prevalent type of cancer that impacts women in the world, which shows a significant risk to the death rates of women. Early identification of breast cancer might drastically decrease patient mortality and greatly improve the chance of an effective treatment. In modern times, machine learning models have become crucial for classifying cancer and strengthening both the accuracy and efficiency of diagnostic and medical treatment strategies. Therefore, this study is focused on early detection of breast cancer using a variety of machine learning algorithms and desires to identify the most effective feature selection process with an amalgamated dataset. Initially, we evaluated five traditional models and two meta-models on separate datasets. To find the most valuable features, the study used the Least Absolute Shrinkage and Selection Operator (LASSO) as well as SHapley Additive exPlanations (SHAP) selection methods and analyzed them through a wide range of performance regulations. Additionally, we applied these models to the combined dataset and observed that the mergeddataset was significantly beneficial for breast cancer diagnosis. After analyzing the feature selection strategies, it was demonstrated that the majority of models performed more accurately when utilizing SHAP methodologies. Notably, three traditional models and two meta-classifiers obtained an accuracy of 99.82%, demonstrating superior performance compared to state-of-the-art methods. This advancement holds a crucial role as it lays the foundation for refining diagnostic tools and enhancing the progression of medical science in this field. •Study machine learning models for breast cancer prediction and diagnosis.•Integrate two Wisconsin Breast Cancer datasets to generate a larger, more inclusive dataset.•Use LASSO and SHAP feature selection methods to determine the features related to breast cancer detection.•Review the performance criteria and use the classifier that performs well and has higher accuracy.•Analyze the relevance of the different models and use the selected machine learning models.
ISSN:2772-4425
2772-4425
DOI:10.1016/j.health.2024.100353