Loading…

Encoder-decoder models for chest X-ray report generation perform no better than unconditioned baselines

High quality radiology reporting of chest X-ray images is of core importance for high-quality patient diagnosis and care. Automatically generated reports can assist radiologists by reducing their workload and even may prevent errors. Machine Learning (ML) models for this task take an X-ray image as...

Full description

Saved in:

Bibliographic Details
Published in:	PloS one 2021-11, Vol.16 (11), p.e0259639-e0259639
Main Authors:	Babar, Zaheer, van Laarhoven, Twan, Marchiori, Elena
Format:	Article
Language:	English
Subjects:	Algorithms Automation Chest Coders Computer and Information Sciences Diagnostic systems Encoders-Decoders Health aspects Image quality Information science Keywords Learning algorithms Machine learning Medical diagnosis Medical imaging Medicine and Health Sciences Model accuracy Neural networks People and Places Radiology Research and Analysis Methods Semantics Social Sciences Tomography, X-Ray Computed X-rays
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	High quality radiology reporting of chest X-ray images is of core importance for high-quality patient diagnosis and care. Automatically generated reports can assist radiologists by reducing their workload and even may prevent errors. Machine Learning (ML) models for this task take an X-ray image as input and output a sequence of words. In this work, we show that ML models for this task based on the popular encoder-decoder approach, like 'Show, Attend and Tell' (SA&T) have similar or worse performance than models that do not use the input image, called unconditioned baseline. An unconditioned model achieved diagnostic accuracy of 0.91 on the IU chest X-ray dataset, and significantly outperformed SA&T (0.877) and other popular ML models (p-value < 0.001). This unconditioned model also outperformed SA&T and similar ML methods on the BLEU-4 and METEOR metrics. Also, an unconditioned version of SA&T obtained by permuting the reports generated from images of the test set, achieved diagnostic accuracy of 0.862, comparable to that of SA&T (p-value ≥ 0.05).
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0259639