Loading…
Estimating Classification Consistency of Machine Learning Models for Screening Measures
This article illustrates novel quantitative methods to estimate classification consistency in machine learning models used for screening measures. Screening measures are used in psychology and medicine to classify individuals into diagnostic classifications. In addition to achieving high accuracy, i...
Saved in:
Published in: | Psychological assessment 2024-06, Vol.36 (6-7), p.395-406 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This article illustrates novel quantitative methods to estimate classification consistency in machine learning models used for screening measures. Screening measures are used in psychology and medicine to classify individuals into diagnostic classifications. In addition to achieving high accuracy, it is ideal for the screening process to have high classification consistency, which means that respondents would be classified into the same group every time if the assessment was repeated. Although machine learning models are increasingly being used to predict a screening classification based on individual item responses, methods to describe the classification consistency of machine learning models have not yet been developed. This article addresses this gap by describing methods to estimate classification inconsistency in machine learning models arising from two different sources: sampling error during model fitting and measurement error in the item responses. These methods use data resampling techniques such as the bootstrap and Monte Carlo sampling. These methods are illustrated using three empirical examples predicting a health condition/diagnosis from item responses. R code is provided to facilitate the implementation of the methods. This article highlights the importance of considering classification consistency alongside accuracy when studying screening measures and provides the tools and guidance necessary for applied researchers to obtain classification consistency indices in their machine learning research on diagnostic assessments.
Public Significance Statement
Recently, methods for machine learning have been used to predict from a screening measure if individuals should be flagged for a condition (e.g., as depressed vs. not depressed), but it is unknown if the models provide consistent screening decisions if respondents were to repeatedly receive the screening measure. We propose statistical procedures to help researchers determine if a machine learning model is providing consistent screening decisions. |
---|---|
ISSN: | 1040-3590 1939-134X |
DOI: | 10.1037/pas0001313 |