Loading…

Clinical risk prediction models and informative cluster size: Assessing the performance of a suicide risk prediction algorithm

Clinical visit data are clustered within people, which complicates prediction modeling. Cluster size is often informative because people receiving more care are less healthy and at higher risk of poor outcomes. We used data from seven health systems on 1,518,968 outpatient mental health visits from...

Full description

Saved in:

Bibliographic Details
Published in:	Biometrical journal 2021-10, Vol.63 (7), p.1375-1388
Main Authors:	Coley, Rebecca Yates, Walker, Rod L., Cruz, Maricela, Simon, Gregory E., Shortreed, Susan M.
Format:	Article
Language:	English
Subjects:	Algorithms Area Under Curve Classification Clusters correlated data Datasets electronic health records Humans Logistic Models Machine Learning Mental health Model testing nonignorable cluster size Performance enhancement Performance evaluation Prediction models Predictive analytics Regression analysis Resampling Risk Sensitivity Suicide Suicides & suicide attempts Training
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Clinical visit data are clustered within people, which complicates prediction modeling. Cluster size is often informative because people receiving more care are less healthy and at higher risk of poor outcomes. We used data from seven health systems on 1,518,968 outpatient mental health visits from January 1, 2012 to June 30, 2015 to predict suicide attempt within 90 days. We evaluated true performance of prediction models using a prospective validation set of 4,286,495 visits from October 1, 2015 to September 30, 2017. We examined dividing clustered data on the person or visit level for model training and cross‐validation and considered a within cluster resampling approach for model estimation. We evaluated optimism by comparing estimated performance from a left‐out testing dataset to performance in the prospective dataset. We used two prediction methods, logistic regression with least absolute shrinkage and selection operator (LASSO) and random forest. The random forest model using a visit‐level split for model training and testing was optimistic; it overestimated discrimination (area under the curve, AUC = 0.95 in testing versus 0.84 in prospective validation) and classification accuracy (sensitivity = 0.48 in testing versus 0.19 in prospective validation, 95th percentile cut‐off). Logistic regression and random forest models using a person‐level split performed well, accurately estimating prospective discrimination and classification: estimated AUCs ranged from 0.85 to 0.87 in testing versus 0.85 in prospective validation, and sensitivity ranged from 0.15 to 0.20 in testing versus 0.17 to 0.19 in prospective validation. Within cluster resampling did not improve performance. We recommend dividing clustered data on the person level, rather than visit level, to ensure strong performance in prospective use and accurate estimation of future performance at the time of model development.
ISSN:	0323-3847 1521-4036 1521-4036
DOI:	10.1002/bimj.202000199