Loading…

Resampling to address inequities in predictive modeling of suicide deaths

ObjectiveImprove methodology for equitable suicide death prediction when using sensitive predictors, such as race/ethnicity, for machine learning and statistical methods.MethodsTrain predictive models, logistic regression, naive Bayes, gradient boosting (XGBoost) and random forests, using three resa...

Full description

Saved in:

Bibliographic Details
Published in:	BMJ health & care informatics 2022-04, Vol.29 (1), p.e100456
Main Authors:	Reeves, Majerle, Bhat, Harish S, Goldman-Mellor, Sidra
Format:	Article
Language:	English
Subjects:	Area Under Curve Bayes Theorem Bias Classification Codes Cultural identity Data Science Decision Trees Delivery of Health Care Equity Health informatics Hispanic Americans Humans Machine Learning Medical records Minority & ethnic groups Original Research Pacific Islander people Patients Race Suicide Suicides & suicide attempts
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	ObjectiveImprove methodology for equitable suicide death prediction when using sensitive predictors, such as race/ethnicity, for machine learning and statistical methods.MethodsTrain predictive models, logistic regression, naive Bayes, gradient boosting (XGBoost) and random forests, using three resampling techniques (Blind, Separate, Equity) on emergency department (ED) administrative patient records. The Blind method resamples without considering racial/ethnic group. Comparatively, the Separate method trains disjoint models for each group and the Equity method builds a training set that is balanced both by racial/ethnic group and by class.ResultsUsing the Blind method, performance range of the models’ sensitivity for predicting suicide death between racial/ethnic groups (a measure of prediction inequity) was 0.47 for logistic regression, 0.37 for naive Bayes, 0.56 for XGBoost and 0.58 for random forest. By building separate models for different racial/ethnic groups or using the equity method on the training set, we decreased the range in performance to 0.16, 0.13, 0.19, 0.20 with Separate method, and 0.14, 0.12, 0.24, 0.13 for Equity method, respectively. XGBoost had the highest overall area under the curve (AUC), ranging from 0.69 to 0.79.DiscussionWe increased performance equity between different racial/ethnic groups and show that imbalanced training sets lead to models with poor predictive equity. These methods have comparable AUC scores to other work in the field, using only single ED administrative record data.ConclusionWe propose two methods to improve equity of suicide death prediction among different racial/ethnic groups. These methods may be applied to other sensitive characteristics to improve equity in machine learning with healthcare applications.
ISSN:	2632-1009 2632-1009
DOI:	10.1136/bmjhci-2021-100456