Loading…
Chi2-MI: A hybrid feature selection based machine learning approach in diagnosis of chronic kidney disease
•Development of an intelligent diagnosis system to detect chronic kidney disease.•A hybrid wrapper feature selection method (Chi2-MI) has been proposed and applied.•Data pre-processing methods are adopted to prepare the dataset for the model.•Most impactful features based on correlation scores are s...
Saved in:
Published in: | Intelligent systems with applications 2022-11, Vol.16, p.200144, Article 200144 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •Development of an intelligent diagnosis system to detect chronic kidney disease.•A hybrid wrapper feature selection method (Chi2-MI) has been proposed and applied.•Data pre-processing methods are adopted to prepare the dataset for the model.•Most impactful features based on correlation scores are selected to predict the CKD.•Extra tress classifier can diagnose CKD with 98% accuracy among 14 learning models.
Early detection and characterization are considered crucial in treating and controlling the chronic renal disease. Because of the rising number of patients, the high risk of progression to end-stage renal disease, and the poor prognosis of morbidity and mortality, chronic kidney disease (CKD) is a significant burden on the healthcare system. Detecting CKD in its early stages is critical for saving millions of lives. The uniqueness of this study lies in developing a diagnosis system to detect chronic kidney disease using different Machine Learning (ML) algorithms with the support of a hybrid feature selection approach. This study exploited the 400 clinical data of CKD patients based on the dataset supplied by the University of California Irvine (UCI) available at their Machine Learning repository. Different data preparation techniques like encoding categorical features, missing values imputation, removing outlier factors, handling data imbalance, scaling data at the same level, and selecting relevant features are adopted to prepare the dataset for the prediction model. A hybrid Chi-squared test (Chi2) and Mutual Information (MI) based feature selection approach is proposed to remove redundant features, and a Pearson correlation matrix is also computed to consider the top important features for the prediction. Lastly, the Extra tress classifier can diagnose CKD with 98% accuracy and a 2% true negative rate without data leakage out of 14 machine learning models. On the other hand, the Bagging classifier performed worst with only 60% accuracy. |
---|---|
ISSN: | 2667-3053 2667-3053 |
DOI: | 10.1016/j.iswa.2022.200144 |