Loading…

An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier

•World Health Organization (WHO) reported globally that adult diabetes patients have nearly doubled since 1980, rising from 4.7% to 8.5%. In 2012, 1.5 million people died due to diabetes.•The early identification of disease serves to recognize and stay away from its complications. Machine learning m...

Full description

Saved in:
Bibliographic Details
Published in:International journal of cognitive computing in engineering 2021-06, Vol.2, p.40-46
Main Authors: Kumari, Saloni, Kumar, Deepika, Mittal, Mamta
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•World Health Organization (WHO) reported globally that adult diabetes patients have nearly doubled since 1980, rising from 4.7% to 8.5%. In 2012, 1.5 million people died due to diabetes.•The early identification of disease serves to recognize and stay away from its complications. Machine learning models help to obtain an initial stage recognition about diabetes disease based on physical data and it has shown their abilities to efficiently and strongly deal with high numbers of variables while making strong predictive models.•Therefore, authors have proposed an ensemble of machine learning algorithms viz. random forest, logistic regression, and Naïve Bayes with soft voting classifier for the binary classification of disease into positive and negative.•Accuracy, Precision, Recall, F1-score, AUC value has been taken as the evaluation criteria.•Two datasets have been used for experimentation i.e. PIMA diabetes dataset and breast cancer dataset. The performance of the proposed methodology has been compared and analysed with conventional machine learning algorithms using both the datasets. Diabetes is a dreadful disease identified by escalated levels of glucose in the blood. Machine learning algorithms help in identification and prediction of diabetes at an early stage. The main objective of this study is to predict diabetes mellitus with better accuracy using an ensemble of machine learning algorithms. The Pima Indians Diabetes dataset has been considered for experimentation, which gathers details of patients with and without having diabetes. The proposed ensemble soft voting classifier gives binary classification and uses the ensemble of three machine learning algorithms viz. random forest, logistic regression, and Naive Bayes for the classification. Empirical evaluation of the proposed methodology has been conducted with state-of-the-art methodologies and base classifiers such as AdaBoost, Logistic Regression,Support Vector machine, Random forest, Naïve Bayes, Bagging, GradientBoost, XGBoost, CatBoost. by taking accuracy, precision, recall, F1-score as the evaluation criteria. The proposed ensemble approach gives the highest accuracy, precision, recall, and F1_score value with 79.04%, 73.48%, 71.45% and 80.6% respectively on the PIMA diabetes dataset. Further, the efficiency of the proposed methodology has also been compared and analysed with breast cancer dataset. The proposed ensemble soft voting classifier has given 97.02% accuracy on the breast cancer da
ISSN:2666-3074
2666-3074
DOI:10.1016/j.ijcce.2021.01.001