Loading…
Public Covid-19 X-ray datasets and their impact on model bias – A systematic review of a significant problem
•First systematic review on publicly available COVID-19 X-ray imaging datasets.•Emphasis on dataset characteristics relevant for risk of inducing bias/confounding to models.•The most popular datasets were found at very high risk of inducing bias to models.•Providing guidance for efficient medical im...
Saved in:
Published in: | Medical image analysis 2021-12, Vol.74, p.102225-102225, Article 102225 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •First systematic review on publicly available COVID-19 X-ray imaging datasets.•Emphasis on dataset characteristics relevant for risk of inducing bias/confounding to models.•The most popular datasets were found at very high risk of inducing bias to models.•Providing guidance for efficient medical image analysis using COVID-19 X-ray imaging datasets.
[Display omitted]
Computer-aided-diagnosis and stratification of COVID-19 based on chest X-ray suffers from weak bias assessment and limited quality-control. Undetected bias induced by inappropriate use of datasets, and improper consideration of confounders prevents the translation of prediction models into clinical practice. By adopting established tools for model evaluation to the task of evaluating datasets, this study provides a systematic appraisal of publicly available COVID-19 chest X-ray datasets, determining their potential use and evaluating potential sources of bias. Only 9 out of more than a hundred identified datasets met at least the criteria for proper assessment of risk of bias and could be analysed in detail. Remarkably most of the datasets utilised in 201 papers published in peer-reviewed journals, are not among these 9 datasets, thus leading to models with high risk of bias. This raises concerns about the suitability of such models for clinical use. This systematic review highlights the limited description of datasets employed for modelling and aids researchers to select the most suitable datasets for their task. |
---|---|
ISSN: | 1361-8415 1361-8423 |
DOI: | 10.1016/j.media.2021.102225 |