Loading…

Development of machine learning-based models to predict congenital heart disease: A matched case-control study

[Display omitted] •XGB is the top-performing model for CHD prediction among five machine learning models.•The risk score can forecast the risk of CHD in fetuses of pregnant women during early pregnancy.•Estimated risks of CHD are categorized into three classes − low, moderate, and high.•The risk sco...

Full description

Saved in:
Bibliographic Details
Published in:International journal of medical informatics (Shannon, Ireland) Ireland), 2025-03, Vol.195, Article 105741
Main Authors: Zhang, Shutong, Kang, Chenxi, Cui, Jing, Xue, Haodan, Zhao, Shanshan, Chen, Yukui, Lu, Haixia, Ye, Lu, Wang, Duolao, Chen, Fangyao, Zhao, Yaling, Pei, Leilei, Qu, Pengfei
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:[Display omitted] •XGB is the top-performing model for CHD prediction among five machine learning models.•The risk score can forecast the risk of CHD in fetuses of pregnant women during early pregnancy.•Estimated risks of CHD are categorized into three classes − low, moderate, and high.•The risk score serves as a user-friendly risk assessment tool on the web. The current congenital heart disease (CHD) prediction tools lack adequate interpretability and convenience, hindering the development of personalized CHD management strategies. We developed a machine learning-based risk stratification model for CHD prediction. This study utilized data from 1,759 participants in a case-control study of CHD conducted across six birth defects surveillance hospitals located in Xi’an, Shaanxi Province, Northwest China, spanning from January 2014 to December 2016. The data was partitioned into training and testing datasets with a ratio of 7:3. Predictors were selected from a total of 47 input variables through the Least Absolute Shrinkage and Selection Operator (LASSO). Five machine learning algorithms were used to build the CHD risk prediction models. Model performance was assessed based on a range of learning metrics, including the area under the receiver operating characteristic curve (AUROC), F1 score, and Brier score. Permutation feature importance was employed to elucidate the prediction model. The best-performing model was used to conduct the risk scores. The eXtreme Gradient Boosting (XGB) model demonstrated superior performance among CHD prediction models, achieving an AUROC of 0.772 (95 % CI 0.728, 0.817) in the testing dataset and 0.738 (0.699, 0.775) in the external validation dataset. The pivotal predictors (top 3) identified by the model included living in rural areas, the low wealth index, and folic acid supplements (
ISSN:1386-5056
DOI:10.1016/j.ijmedinf.2024.105741