Loading…
Clinical risk prediction models and informative cluster size: Assessing the performance of a suicide risk prediction algorithm
Clinical visit data are clustered within people, which complicates prediction modeling. Cluster size is often informative because people receiving more care are less healthy and at higher risk of poor outcomes. We used data from seven health systems on 1,518,968 outpatient mental health visits from...
Saved in:
Published in: | Biometrical journal 2021-10, Vol.63 (7), p.1375-1388 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Clinical visit data are clustered within people, which complicates prediction modeling. Cluster size is often informative because people receiving more care are less healthy and at higher risk of poor outcomes. We used data from seven health systems on 1,518,968 outpatient mental health visits from January 1, 2012 to June 30, 2015 to predict suicide attempt within 90 days. We evaluated true performance of prediction models using a prospective validation set of 4,286,495 visits from October 1, 2015 to September 30, 2017. We examined dividing clustered data on the person or visit level for model training and cross‐validation and considered a within cluster resampling approach for model estimation. We evaluated optimism by comparing estimated performance from a left‐out testing dataset to performance in the prospective dataset. We used two prediction methods, logistic regression with least absolute shrinkage and selection operator (LASSO) and random forest. The random forest model using a visit‐level split for model training and testing was optimistic; it overestimated discrimination (area under the curve, AUC = 0.95 in testing versus 0.84 in prospective validation) and classification accuracy (sensitivity = 0.48 in testing versus 0.19 in prospective validation, 95th percentile cut‐off). Logistic regression and random forest models using a person‐level split performed well, accurately estimating prospective discrimination and classification: estimated AUCs ranged from 0.85 to 0.87 in testing versus 0.85 in prospective validation, and sensitivity ranged from 0.15 to 0.20 in testing versus 0.17 to 0.19 in prospective validation. Within cluster resampling did not improve performance. We recommend dividing clustered data on the person level, rather than visit level, to ensure strong performance in prospective use and accurate estimation of future performance at the time of model development. |
---|---|
ISSN: | 0323-3847 1521-4036 1521-4036 |
DOI: | 10.1002/bimj.202000199 |