Loading…

Voting Classification-Based Diabetes Mellitus Prediction Using Hypertuned Machine-Learning Techniques

Diabetes mellitus is a hyperglycemia-like chronic condition that is a troublesome disease. It is estimated that, according to the growing morbidity, by 2040, the world will cross 642 million diabetic patients. This means that each one of the ten adults will be diabetes-affected. Diabetes can also le...

Full description

Saved in:
Bibliographic Details
Published in:Mobile information systems 2022-03, Vol.2022, p.1-16
Main Authors: Mushtaq, Zaigham, Ramzan, Muhammad Farhan, Ali, Sikandar, Baseer, Samad, Samad, Ali, Husnain, Mujtaba
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c337t-856408581c2a8a292a8cc0cf934ac0c8dce7248f350bb7e1b263704a25f9e05d3
cites cdi_FETCH-LOGICAL-c337t-856408581c2a8a292a8cc0cf934ac0c8dce7248f350bb7e1b263704a25f9e05d3
container_end_page 16
container_issue
container_start_page 1
container_title Mobile information systems
container_volume 2022
creator Mushtaq, Zaigham
Ramzan, Muhammad Farhan
Ali, Sikandar
Baseer, Samad
Samad, Ali
Husnain, Mujtaba
description Diabetes mellitus is a hyperglycemia-like chronic condition that is a troublesome disease. It is estimated that, according to the growing morbidity, by 2040, the world will cross 642 million diabetic patients. This means that each one of the ten adults will be diabetes-affected. Diabetes can also lead to other illnesses such as heart attacks, kidney damage, and even blindness. The prediction of diabetes in advance motivates us to develop a machine learning-based model. A dataset was obtained from the online repository for this work. The obtained dataset was imbalanced. An imbalanced dataset presents a challenge that is needed to be balanced for prediction using multiple machine learning like Tomek and SMOTE. These techniques remove necessary outliers that are incomplete in the provided dataset. These outliers are also managed using the IQR method. Additionally, this research employed a two-stage model selection methodology. In the first stage, logistic regression, Support Vector Machine, k-nearest neighbors, gradient boost, Naive Bayes, and Random Forests were applied to determine the efficiency of prediction based on patients’ preconditioning. At this stage, Random Forest was found to be the best with an accuracy of 80.7% after applying SMOTE oversampling technique to balance the dataset. In the second stage, three better-performing models were used by utilizing a voting algorithm. The results were encouraging, and the model obtained 82.0% accuracy with the default dataset and 81.7% accuracy with the balanced dataset. Naive Bayes Theorem, Gradient Boosting Classifier, and Random Forest were used as inputs to the voting algorithm.
doi_str_mv 10.1155/2022/6521532
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2643818447</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2643818447</sourcerecordid><originalsourceid>FETCH-LOGICAL-c337t-856408581c2a8a292a8cc0cf934ac0c8dce7248f350bb7e1b263704a25f9e05d3</originalsourceid><addsrcrecordid>eNp9kMtOwzAQRS0EEqWw4wMisYRQP2NnCeVRpFawaFF3keNMqKuQFNsR6t_jqF2zmTvSnJmruQhdE3xPiBATiimdZIISwegJGhElRZpjsT6NvZA8xUSuz9GF91uMM8yEHCH47IJtv5Jpo723tTU62K5NH7WHKnmyuoQAPllA09jQ--TDQWXNgCQrP-zN9jtwoW8jvdBmY1tI56BdO8yWYDat_enBX6KzWjcero46RquX5-V0ls7fX9-mD_PUMCZDqkTGsRKKGKqVpnmsxmBT54zrqKoyIClXNRO4LCWQkmZMYq6pqHPAomJjdHO4u3Pd4BuKbde7NloWNONMEcW5jNTdgTKu895BXeyc_dZuXxBcDEEWQ5DFMciI3x7w-F2lf-3_9B8_b3Mn</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2643818447</pqid></control><display><type>article</type><title>Voting Classification-Based Diabetes Mellitus Prediction Using Hypertuned Machine-Learning Techniques</title><source>Wiley_OA刊</source><creator>Mushtaq, Zaigham ; Ramzan, Muhammad Farhan ; Ali, Sikandar ; Baseer, Samad ; Samad, Ali ; Husnain, Mujtaba</creator><contributor>Farouk, Ahmed ; Ahmed Farouk</contributor><creatorcontrib>Mushtaq, Zaigham ; Ramzan, Muhammad Farhan ; Ali, Sikandar ; Baseer, Samad ; Samad, Ali ; Husnain, Mujtaba ; Farouk, Ahmed ; Ahmed Farouk</creatorcontrib><description>Diabetes mellitus is a hyperglycemia-like chronic condition that is a troublesome disease. It is estimated that, according to the growing morbidity, by 2040, the world will cross 642 million diabetic patients. This means that each one of the ten adults will be diabetes-affected. Diabetes can also lead to other illnesses such as heart attacks, kidney damage, and even blindness. The prediction of diabetes in advance motivates us to develop a machine learning-based model. A dataset was obtained from the online repository for this work. The obtained dataset was imbalanced. An imbalanced dataset presents a challenge that is needed to be balanced for prediction using multiple machine learning like Tomek and SMOTE. These techniques remove necessary outliers that are incomplete in the provided dataset. These outliers are also managed using the IQR method. Additionally, this research employed a two-stage model selection methodology. In the first stage, logistic regression, Support Vector Machine, k-nearest neighbors, gradient boost, Naive Bayes, and Random Forests were applied to determine the efficiency of prediction based on patients’ preconditioning. At this stage, Random Forest was found to be the best with an accuracy of 80.7% after applying SMOTE oversampling technique to balance the dataset. In the second stage, three better-performing models were used by utilizing a voting algorithm. The results were encouraging, and the model obtained 82.0% accuracy with the default dataset and 81.7% accuracy with the balanced dataset. Naive Bayes Theorem, Gradient Boosting Classifier, and Random Forest were used as inputs to the voting algorithm.</description><identifier>ISSN: 1574-017X</identifier><identifier>EISSN: 1875-905X</identifier><identifier>DOI: 10.1155/2022/6521532</identifier><language>eng</language><publisher>Amsterdam: Hindawi</publisher><subject>Accuracy ; Algorithms ; Bayes Theorem ; Blindness ; Body mass index ; Classification ; Datasets ; Diabetes ; Diabetes mellitus ; Disease ; Electronic health records ; Glucose ; Health care ; Hyperglycemia ; Insulin ; Machine learning ; Medical research ; Outliers (statistics) ; Oversampling ; Preconditioning ; Support vector machines ; Voting</subject><ispartof>Mobile information systems, 2022-03, Vol.2022, p.1-16</ispartof><rights>Copyright © 2022 Zaigham Mushtaq et al.</rights><rights>Copyright © 2022 Zaigham Mushtaq et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c337t-856408581c2a8a292a8cc0cf934ac0c8dce7248f350bb7e1b263704a25f9e05d3</citedby><cites>FETCH-LOGICAL-c337t-856408581c2a8a292a8cc0cf934ac0c8dce7248f350bb7e1b263704a25f9e05d3</cites><orcidid>0000-0002-3754-3450 ; 0000-0001-6061-1987 ; 0000-0001-9987-2535 ; 0000-0002-9964-4716 ; 0000-0002-2753-8615</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><contributor>Farouk, Ahmed</contributor><contributor>Ahmed Farouk</contributor><creatorcontrib>Mushtaq, Zaigham</creatorcontrib><creatorcontrib>Ramzan, Muhammad Farhan</creatorcontrib><creatorcontrib>Ali, Sikandar</creatorcontrib><creatorcontrib>Baseer, Samad</creatorcontrib><creatorcontrib>Samad, Ali</creatorcontrib><creatorcontrib>Husnain, Mujtaba</creatorcontrib><title>Voting Classification-Based Diabetes Mellitus Prediction Using Hypertuned Machine-Learning Techniques</title><title>Mobile information systems</title><description>Diabetes mellitus is a hyperglycemia-like chronic condition that is a troublesome disease. It is estimated that, according to the growing morbidity, by 2040, the world will cross 642 million diabetic patients. This means that each one of the ten adults will be diabetes-affected. Diabetes can also lead to other illnesses such as heart attacks, kidney damage, and even blindness. The prediction of diabetes in advance motivates us to develop a machine learning-based model. A dataset was obtained from the online repository for this work. The obtained dataset was imbalanced. An imbalanced dataset presents a challenge that is needed to be balanced for prediction using multiple machine learning like Tomek and SMOTE. These techniques remove necessary outliers that are incomplete in the provided dataset. These outliers are also managed using the IQR method. Additionally, this research employed a two-stage model selection methodology. In the first stage, logistic regression, Support Vector Machine, k-nearest neighbors, gradient boost, Naive Bayes, and Random Forests were applied to determine the efficiency of prediction based on patients’ preconditioning. At this stage, Random Forest was found to be the best with an accuracy of 80.7% after applying SMOTE oversampling technique to balance the dataset. In the second stage, three better-performing models were used by utilizing a voting algorithm. The results were encouraging, and the model obtained 82.0% accuracy with the default dataset and 81.7% accuracy with the balanced dataset. Naive Bayes Theorem, Gradient Boosting Classifier, and Random Forest were used as inputs to the voting algorithm.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Bayes Theorem</subject><subject>Blindness</subject><subject>Body mass index</subject><subject>Classification</subject><subject>Datasets</subject><subject>Diabetes</subject><subject>Diabetes mellitus</subject><subject>Disease</subject><subject>Electronic health records</subject><subject>Glucose</subject><subject>Health care</subject><subject>Hyperglycemia</subject><subject>Insulin</subject><subject>Machine learning</subject><subject>Medical research</subject><subject>Outliers (statistics)</subject><subject>Oversampling</subject><subject>Preconditioning</subject><subject>Support vector machines</subject><subject>Voting</subject><issn>1574-017X</issn><issn>1875-905X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kMtOwzAQRS0EEqWw4wMisYRQP2NnCeVRpFawaFF3keNMqKuQFNsR6t_jqF2zmTvSnJmruQhdE3xPiBATiimdZIISwegJGhElRZpjsT6NvZA8xUSuz9GF91uMM8yEHCH47IJtv5Jpo723tTU62K5NH7WHKnmyuoQAPllA09jQ--TDQWXNgCQrP-zN9jtwoW8jvdBmY1tI56BdO8yWYDat_enBX6KzWjcero46RquX5-V0ls7fX9-mD_PUMCZDqkTGsRKKGKqVpnmsxmBT54zrqKoyIClXNRO4LCWQkmZMYq6pqHPAomJjdHO4u3Pd4BuKbde7NloWNONMEcW5jNTdgTKu895BXeyc_dZuXxBcDEEWQ5DFMciI3x7w-F2lf-3_9B8_b3Mn</recordid><startdate>20220319</startdate><enddate>20220319</enddate><creator>Mushtaq, Zaigham</creator><creator>Ramzan, Muhammad Farhan</creator><creator>Ali, Sikandar</creator><creator>Baseer, Samad</creator><creator>Samad, Ali</creator><creator>Husnain, Mujtaba</creator><general>Hindawi</general><general>Hindawi Limited</general><scope>RHU</scope><scope>RHW</scope><scope>RHX</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-3754-3450</orcidid><orcidid>https://orcid.org/0000-0001-6061-1987</orcidid><orcidid>https://orcid.org/0000-0001-9987-2535</orcidid><orcidid>https://orcid.org/0000-0002-9964-4716</orcidid><orcidid>https://orcid.org/0000-0002-2753-8615</orcidid></search><sort><creationdate>20220319</creationdate><title>Voting Classification-Based Diabetes Mellitus Prediction Using Hypertuned Machine-Learning Techniques</title><author>Mushtaq, Zaigham ; Ramzan, Muhammad Farhan ; Ali, Sikandar ; Baseer, Samad ; Samad, Ali ; Husnain, Mujtaba</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c337t-856408581c2a8a292a8cc0cf934ac0c8dce7248f350bb7e1b263704a25f9e05d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Bayes Theorem</topic><topic>Blindness</topic><topic>Body mass index</topic><topic>Classification</topic><topic>Datasets</topic><topic>Diabetes</topic><topic>Diabetes mellitus</topic><topic>Disease</topic><topic>Electronic health records</topic><topic>Glucose</topic><topic>Health care</topic><topic>Hyperglycemia</topic><topic>Insulin</topic><topic>Machine learning</topic><topic>Medical research</topic><topic>Outliers (statistics)</topic><topic>Oversampling</topic><topic>Preconditioning</topic><topic>Support vector machines</topic><topic>Voting</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mushtaq, Zaigham</creatorcontrib><creatorcontrib>Ramzan, Muhammad Farhan</creatorcontrib><creatorcontrib>Ali, Sikandar</creatorcontrib><creatorcontrib>Baseer, Samad</creatorcontrib><creatorcontrib>Samad, Ali</creatorcontrib><creatorcontrib>Husnain, Mujtaba</creatorcontrib><collection>Hindawi Publishing Complete</collection><collection>Hindawi Publishing Subscription Journals</collection><collection>Hindawi Publishing Open Access</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Mobile information systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mushtaq, Zaigham</au><au>Ramzan, Muhammad Farhan</au><au>Ali, Sikandar</au><au>Baseer, Samad</au><au>Samad, Ali</au><au>Husnain, Mujtaba</au><au>Farouk, Ahmed</au><au>Ahmed Farouk</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Voting Classification-Based Diabetes Mellitus Prediction Using Hypertuned Machine-Learning Techniques</atitle><jtitle>Mobile information systems</jtitle><date>2022-03-19</date><risdate>2022</risdate><volume>2022</volume><spage>1</spage><epage>16</epage><pages>1-16</pages><issn>1574-017X</issn><eissn>1875-905X</eissn><abstract>Diabetes mellitus is a hyperglycemia-like chronic condition that is a troublesome disease. It is estimated that, according to the growing morbidity, by 2040, the world will cross 642 million diabetic patients. This means that each one of the ten adults will be diabetes-affected. Diabetes can also lead to other illnesses such as heart attacks, kidney damage, and even blindness. The prediction of diabetes in advance motivates us to develop a machine learning-based model. A dataset was obtained from the online repository for this work. The obtained dataset was imbalanced. An imbalanced dataset presents a challenge that is needed to be balanced for prediction using multiple machine learning like Tomek and SMOTE. These techniques remove necessary outliers that are incomplete in the provided dataset. These outliers are also managed using the IQR method. Additionally, this research employed a two-stage model selection methodology. In the first stage, logistic regression, Support Vector Machine, k-nearest neighbors, gradient boost, Naive Bayes, and Random Forests were applied to determine the efficiency of prediction based on patients’ preconditioning. At this stage, Random Forest was found to be the best with an accuracy of 80.7% after applying SMOTE oversampling technique to balance the dataset. In the second stage, three better-performing models were used by utilizing a voting algorithm. The results were encouraging, and the model obtained 82.0% accuracy with the default dataset and 81.7% accuracy with the balanced dataset. Naive Bayes Theorem, Gradient Boosting Classifier, and Random Forest were used as inputs to the voting algorithm.</abstract><cop>Amsterdam</cop><pub>Hindawi</pub><doi>10.1155/2022/6521532</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0002-3754-3450</orcidid><orcidid>https://orcid.org/0000-0001-6061-1987</orcidid><orcidid>https://orcid.org/0000-0001-9987-2535</orcidid><orcidid>https://orcid.org/0000-0002-9964-4716</orcidid><orcidid>https://orcid.org/0000-0002-2753-8615</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1574-017X
ispartof Mobile information systems, 2022-03, Vol.2022, p.1-16
issn 1574-017X
1875-905X
language eng
recordid cdi_proquest_journals_2643818447
source Wiley_OA刊
subjects Accuracy
Algorithms
Bayes Theorem
Blindness
Body mass index
Classification
Datasets
Diabetes
Diabetes mellitus
Disease
Electronic health records
Glucose
Health care
Hyperglycemia
Insulin
Machine learning
Medical research
Outliers (statistics)
Oversampling
Preconditioning
Support vector machines
Voting
title Voting Classification-Based Diabetes Mellitus Prediction Using Hypertuned Machine-Learning Techniques
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T15%3A34%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Voting%20Classification-Based%20Diabetes%20Mellitus%20Prediction%20Using%20Hypertuned%20Machine-Learning%20Techniques&rft.jtitle=Mobile%20information%20systems&rft.au=Mushtaq,%20Zaigham&rft.date=2022-03-19&rft.volume=2022&rft.spage=1&rft.epage=16&rft.pages=1-16&rft.issn=1574-017X&rft.eissn=1875-905X&rft_id=info:doi/10.1155/2022/6521532&rft_dat=%3Cproquest_cross%3E2643818447%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c337t-856408581c2a8a292a8cc0cf934ac0c8dce7248f350bb7e1b263704a25f9e05d3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2643818447&rft_id=info:pmid/&rfr_iscdi=true