Loading…

Voting Classification-Based Diabetes Mellitus Prediction Using Hypertuned Machine-Learning Techniques

Diabetes mellitus is a hyperglycemia-like chronic condition that is a troublesome disease. It is estimated that, according to the growing morbidity, by 2040, the world will cross 642 million diabetic patients. This means that each one of the ten adults will be diabetes-affected. Diabetes can also le...

Full description

Saved in:

Bibliographic Details
Published in:	Mobile information systems 2022-03, Vol.2022, p.1-16
Main Authors:	Mushtaq, Zaigham, Ramzan, Muhammad Farhan, Ali, Sikandar, Baseer, Samad, Samad, Ali, Husnain, Mujtaba
Format:	Article
Language:	English
Subjects:	Accuracy Algorithms Bayes Theorem Blindness Body mass index Classification Datasets Diabetes Diabetes mellitus Disease Electronic health records Glucose Health care Hyperglycemia Insulin Machine learning Medical research Outliers (statistics) Oversampling Preconditioning Support vector machines Voting
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c337t-856408581c2a8a292a8cc0cf934ac0c8dce7248f350bb7e1b263704a25f9e05d3
cites	cdi_FETCH-LOGICAL-c337t-856408581c2a8a292a8cc0cf934ac0c8dce7248f350bb7e1b263704a25f9e05d3
container_end_page	16
container_issue
container_start_page	1
container_title	Mobile information systems
container_volume	2022
creator	Mushtaq, Zaigham Ramzan, Muhammad Farhan Ali, Sikandar Baseer, Samad Samad, Ali Husnain, Mujtaba
description	Diabetes mellitus is a hyperglycemia-like chronic condition that is a troublesome disease. It is estimated that, according to the growing morbidity, by 2040, the world will cross 642 million diabetic patients. This means that each one of the ten adults will be diabetes-affected. Diabetes can also lead to other illnesses such as heart attacks, kidney damage, and even blindness. The prediction of diabetes in advance motivates us to develop a machine learning-based model. A dataset was obtained from the online repository for this work. The obtained dataset was imbalanced. An imbalanced dataset presents a challenge that is needed to be balanced for prediction using multiple machine learning like Tomek and SMOTE. These techniques remove necessary outliers that are incomplete in the provided dataset. These outliers are also managed using the IQR method. Additionally, this research employed a two-stage model selection methodology. In the first stage, logistic regression, Support Vector Machine, k-nearest neighbors, gradient boost, Naive Bayes, and Random Forests were applied to determine the efficiency of prediction based on patients’ preconditioning. At this stage, Random Forest was found to be the best with an accuracy of 80.7% after applying SMOTE oversampling technique to balance the dataset. In the second stage, three better-performing models were used by utilizing a voting algorithm. The results were encouraging, and the model obtained 82.0% accuracy with the default dataset and 81.7% accuracy with the balanced dataset. Naive Bayes Theorem, Gradient Boosting Classifier, and Random Forest were used as inputs to the voting algorithm.
doi_str_mv	10.1155/2022/6521532
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2643818447</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2643818447</sourcerecordid><originalsourceid>FETCH-LOGICAL-c337t-856408581c2a8a292a8cc0cf934ac0c8dce7248f350bb7e1b263704a25f9e05d3</originalsourceid><addsrcrecordid>eNp9kMtOwzAQRS0EEqWw4wMisYRQP2NnCeVRpFawaFF3keNMqKuQFNsR6t_jqF2zmTvSnJmruQhdE3xPiBATiimdZIISwegJGhElRZpjsT6NvZA8xUSuz9GF91uMM8yEHCH47IJtv5Jpo723tTU62K5NH7WHKnmyuoQAPllA09jQ--TDQWXNgCQrP-zN9jtwoW8jvdBmY1tI56BdO8yWYDat_enBX6KzWjcero46RquX5-V0ls7fX9-mD_PUMCZDqkTGsRKKGKqVpnmsxmBT54zrqKoyIClXNRO4LCWQkmZMYq6pqHPAomJjdHO4u3Pd4BuKbde7NloWNONMEcW5jNTdgTKu895BXeyc_dZuXxBcDEEWQ5DFMciI3x7w-F2lf-3_9B8_b3Mn</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2643818447</pqid></control><display><type>article</type><title>Voting Classification-Based Diabetes Mellitus Prediction Using Hypertuned Machine-Learning Techniques</title><source>Wiley_OA刊</source><creator>Mushtaq, Zaigham ; Ramzan, Muhammad Farhan ; Ali, Sikandar ; Baseer, Samad ; Samad, Ali ; Husnain, Mujtaba</creator><contributor>Farouk, Ahmed ; Ahmed Farouk</contributor><creatorcontrib>Mushtaq, Zaigham ; Ramzan, Muhammad Farhan ; Ali, Sikandar ; Baseer, Samad ; Samad, Ali ; Husnain, Mujtaba ; Farouk, Ahmed ; Ahmed Farouk</creatorcontrib><description>Diabetes mellitus is a hyperglycemia-like chronic condition that is a troublesome disease. It is estimated that, according to the growing morbidity, by 2040, the world will cross 642 million diabetic patients. This means that each one of the ten adults will be diabetes-affected. Diabetes can also lead to other illnesses such as heart attacks, kidney damage, and even blindness. The prediction of diabetes in advance motivates us to develop a machine learning-based model. A dataset was obtained from the online repository for this work. The obtained dataset was imbalanced. An imbalanced dataset presents a challenge that is needed to be balanced for prediction using multiple machine learning like Tomek and SMOTE. These techniques remove necessary outliers that are incomplete in the provided dataset. These outliers are also managed using the IQR method. Additionally, this research employed a two-stage model selection methodology. In the first stage, logistic regression, Support Vector Machine, k-nearest neighbors, gradient boost, Naive Bayes, and Random Forests were applied to determine the efficiency of prediction based on patients’ preconditioning. At this stage, Random Forest was found to be the best with an accuracy of 80.7% after applying SMOTE oversampling technique to balance the dataset. In the second stage, three better-performing models were used by utilizing a voting algorithm. The results were encouraging, and the model obtained 82.0% accuracy with the default dataset and 81.7% accuracy with the balanced dataset. Naive Bayes Theorem, Gradient Boosting Classifier, and Random Forest were used as inputs to the voting algorithm.</description><identifier>ISSN: 1574-017X</identifier><identifier>EISSN: 1875-905X</identifier><identifier>DOI: 10.1155/2022/6521532</identifier><language>eng</language><publisher>Amsterdam: Hindawi</publisher><subject>Accuracy ; Algorithms ; Bayes Theorem ; Blindness ; Body mass index ; Classification ; Datasets ; Diabetes ; Diabetes mellitus ; Disease ; Electronic health records ; Glucose ; Health care ; Hyperglycemia ; Insulin ; Machine learning ; Medical research ; Outliers (statistics) ; Oversampling ; Preconditioning ; Support vector machines ; Voting</subject><ispartof>Mobile information systems, 2022-03, Vol.2022, p.1-16</ispartof><rights>Copyright © 2022 Zaigham Mushtaq et al.</rights><rights>Copyright © 2022 Zaigham Mushtaq et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c337t-856408581c2a8a292a8cc0cf934ac0c8dce7248f350bb7e1b263704a25f9e05d3</citedby><cites>FETCH-LOGICAL-c337t-856408581c2a8a292a8cc0cf934ac0c8dce7248f350bb7e1b263704a25f9e05d3</cites><orcidid>0000-0002-3754-3450 ; 0000-0001-6061-1987 ; 0000-0001-9987-2535 ; 0000-0002-9964-4716 ; 0000-0002-2753-8615</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><contributor>Farouk, Ahmed</contributor><contributor>Ahmed Farouk</contributor><creatorcontrib>Mushtaq, Zaigham</creatorcontrib><creatorcontrib>Ramzan, Muhammad Farhan</creatorcontrib><creatorcontrib>Ali, Sikandar</creatorcontrib><creatorcontrib>Baseer, Samad</creatorcontrib><creatorcontrib>Samad, Ali</creatorcontrib><creatorcontrib>Husnain, Mujtaba</creatorcontrib><title>Voting Classification-Based Diabetes Mellitus Prediction Using Hypertuned Machine-Learning Techniques</title><title>Mobile information systems</title><description>Diabetes mellitus is a hyperglycemia-like chronic condition that is a troublesome disease. It is estimated that, according to the growing morbidity, by 2040, the world will cross 642 million diabetic patients. This means that each one of the ten adults will be diabetes-affected. Diabetes can also lead to other illnesses such as heart attacks, kidney damage, and even blindness. The prediction of diabetes in advance motivates us to develop a machine learning-based model. A dataset was obtained from the online repository for this work. The obtained dataset was imbalanced. An imbalanced dataset presents a challenge that is needed to be balanced for prediction using multiple machine learning like Tomek and SMOTE. These techniques remove necessary outliers that are incomplete in the provided dataset. These outliers are also managed using the IQR method. Additionally, this research employed a two-stage model selection methodology. In the first stage, logistic regression, Support Vector Machine, k-nearest neighbors, gradient boost, Naive Bayes, and Random Forests were applied to determine the efficiency of prediction based on patients’ preconditioning. At this stage, Random Forest was found to be the best with an accuracy of 80.7% after applying SMOTE oversampling technique to balance the dataset. In the second stage, three better-performing models were used by utilizing a voting algorithm. The results were encouraging, and the model obtained 82.0% accuracy with the default dataset and 81.7% accuracy with the balanced dataset. Naive Bayes Theorem, Gradient Boosting Classifier, and Random Forest were used as inputs to the voting algorithm.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Bayes Theorem</subject><subject>Blindness</subject><subject>Body mass index</subject><subject>Classification</subject><subject>Datasets</subject><subject>Diabetes</subject><subject>Diabetes mellitus</subject><subject>Disease</subject><subject>Electronic health records</subject><subject>Glucose</subject><subject>Health care</subject><subject>Hyperglycemia</subject><subject>Insulin</subject><subject>Machine learning</subject><subject>Medical research</subject><subject>Outliers (statistics)</subject><subject>Oversampling</subject><subject>Preconditioning</subject><subject>Support vector machines</subject><subject>Voting</subject><issn>1574-017X</issn><issn>1875-905X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kMtOwzAQRS0EEqWw4wMisYRQP2NnCeVRpFawaFF3keNMqKuQFNsR6t_jqF2zmTvSnJmruQhdE3xPiBATiimdZIISwegJGhElRZpjsT6NvZA8xUSuz9GF91uMM8yEHCH47IJtv5Jpo723tTU62K5NH7WHKnmyuoQAPllA09jQ--TDQWXNgCQrP-zN9jtwoW8jvdBmY1tI56BdO8yWYDat_enBX6KzWjcero46RquX5-V0ls7fX9-mD_PUMCZDqkTGsRKKGKqVpnmsxmBT54zrqKoyIClXNRO4LCWQkmZMYq6pqHPAomJjdHO4u3Pd4BuKbde7NloWNONMEcW5jNTdgTKu895BXeyc_dZuXxBcDEEWQ5DFMciI3x7w-F2lf-3_9B8_b3Mn</recordid><startdate>20220319</startdate><enddate>20220319</enddate><creator>Mushtaq, Zaigham</creator><creator>Ramzan, Muhammad Farhan</creator><creator>Ali, Sikandar</creator><creator>Baseer, Samad</creator><creator>Samad, Ali</creator><creator>Husnain, Mujtaba</creator><general>Hindawi</general><general>Hindawi Limited</general><scope>RHU</scope><scope>RHW</scope><scope>RHX</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-3754-3450</orcidid><orcidid>https://orcid.org/0000-0001-6061-1987</orcidid><orcidid>https://orcid.org/0000-0001-9987-2535</orcidid><orcidid>https://orcid.org/0000-0002-9964-4716</orcidid><orcidid>https://orcid.org/0000-0002-2753-8615</orcidid></search><sort><creationdate>20220319</creationdate><title>Voting Classification-Based Diabetes Mellitus Prediction Using Hypertuned Machine-Learning Techniques</title><author>Mushtaq, Zaigham ; Ramzan, Muhammad Farhan ; Ali, Sikandar ; Baseer, Samad ; Samad, Ali ; Husnain, Mujtaba</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c337t-856408581c2a8a292a8cc0cf934ac0c8dce7248f350bb7e1b263704a25f9e05d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Bayes Theorem</topic><topic>Blindness</topic><topic>Body mass index</topic><topic>Classification</topic><topic>Datasets</topic><topic>Diabetes</topic><topic>Diabetes mellitus</topic><topic>Disease</topic><topic>Electronic health records</topic><topic>Glucose</topic><topic>Health care</topic><topic>Hyperglycemia</topic><topic>Insulin</topic><topic>Machine learning</topic><topic>Medical research</topic><topic>Outliers (statistics)</topic><topic>Oversampling</topic><topic>Preconditioning</topic><topic>Support vector machines</topic><topic>Voting</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mushtaq, Zaigham</creatorcontrib><creatorcontrib>Ramzan, Muhammad Farhan</creatorcontrib><creatorcontrib>Ali, Sikandar</creatorcontrib><creatorcontrib>Baseer, Samad</creatorcontrib><creatorcontrib>Samad, Ali</creatorcontrib><creatorcontrib>Husnain, Mujtaba</creatorcontrib><collection>Hindawi Publishing Complete</collection><collection>Hindawi Publishing Subscription Journals</collection><collection>Hindawi Publishing Open Access</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Mobile information systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mushtaq, Zaigham</au><au>Ramzan, Muhammad Farhan</au><au>Ali, Sikandar</au><au>Baseer, Samad</au><au>Samad, Ali</au><au>Husnain, Mujtaba</au><au>Farouk, Ahmed</au><au>Ahmed Farouk</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Voting Classification-Based Diabetes Mellitus Prediction Using Hypertuned Machine-Learning Techniques</atitle><jtitle>Mobile information systems</jtitle><date>2022-03-19</date><risdate>2022</risdate><volume>2022</volume><spage>1</spage><epage>16</epage><pages>1-16</pages><issn>1574-017X</issn><eissn>1875-905X</eissn><abstract>Diabetes mellitus is a hyperglycemia-like chronic condition that is a troublesome disease. It is estimated that, according to the growing morbidity, by 2040, the world will cross 642 million diabetic patients. This means that each one of the ten adults will be diabetes-affected. Diabetes can also lead to other illnesses such as heart attacks, kidney damage, and even blindness. The prediction of diabetes in advance motivates us to develop a machine learning-based model. A dataset was obtained from the online repository for this work. The obtained dataset was imbalanced. An imbalanced dataset presents a challenge that is needed to be balanced for prediction using multiple machine learning like Tomek and SMOTE. These techniques remove necessary outliers that are incomplete in the provided dataset. These outliers are also managed using the IQR method. Additionally, this research employed a two-stage model selection methodology. In the first stage, logistic regression, Support Vector Machine, k-nearest neighbors, gradient boost, Naive Bayes, and Random Forests were applied to determine the efficiency of prediction based on patients’ preconditioning. At this stage, Random Forest was found to be the best with an accuracy of 80.7% after applying SMOTE oversampling technique to balance the dataset. In the second stage, three better-performing models were used by utilizing a voting algorithm. The results were encouraging, and the model obtained 82.0% accuracy with the default dataset and 81.7% accuracy with the balanced dataset. Naive Bayes Theorem, Gradient Boosting Classifier, and Random Forest were used as inputs to the voting algorithm.</abstract><cop>Amsterdam</cop><pub>Hindawi</pub><doi>10.1155/2022/6521532</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0002-3754-3450</orcidid><orcidid>https://orcid.org/0000-0001-6061-1987</orcidid><orcidid>https://orcid.org/0000-0001-9987-2535</orcidid><orcidid>https://orcid.org/0000-0002-9964-4716</orcidid><orcidid>https://orcid.org/0000-0002-2753-8615</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1574-017X
ispartof	Mobile information systems, 2022-03, Vol.2022, p.1-16
issn	1574-017X 1875-905X
language	eng
recordid	cdi_proquest_journals_2643818447
source	Wiley_OA刊
subjects	Accuracy Algorithms Bayes Theorem Blindness Body mass index Classification Datasets Diabetes Diabetes mellitus Disease Electronic health records Glucose Health care Hyperglycemia Insulin Machine learning Medical research Outliers (statistics) Oversampling Preconditioning Support vector machines Voting
title	Voting Classification-Based Diabetes Mellitus Prediction Using Hypertuned Machine-Learning Techniques
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T15%3A34%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Voting%20Classification-Based%20Diabetes%20Mellitus%20Prediction%20Using%20Hypertuned%20Machine-Learning%20Techniques&rft.jtitle=Mobile%20information%20systems&rft.au=Mushtaq,%20Zaigham&rft.date=2022-03-19&rft.volume=2022&rft.spage=1&rft.epage=16&rft.pages=1-16&rft.issn=1574-017X&rft.eissn=1875-905X&rft_id=info:doi/10.1155/2022/6521532&rft_dat=%3Cproquest_cross%3E2643818447%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c337t-856408581c2a8a292a8cc0cf934ac0c8dce7248f350bb7e1b263704a25f9e05d3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2643818447&rft_id=info:pmid/&rfr_iscdi=true