Loading…
Convolutional Neural Networks for Automated Classification of Prostate Multiparametric Magnetic Resonance Imaging Based on Image Quality
Background Prostate magnetic resonance imaging (MRI) is technically demanding, requiring high image quality to reach its full diagnostic potential. An automated method to identify diagnostically inadequate images could help optimize image quality. Purpose To develop a convolutional neural networks (...
Saved in:
Published in: | Journal of magnetic resonance imaging 2022-02, Vol.55 (2), p.480-490 |
---|---|
Main Authors: | , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Background
Prostate magnetic resonance imaging (MRI) is technically demanding, requiring high image quality to reach its full diagnostic potential. An automated method to identify diagnostically inadequate images could help optimize image quality.
Purpose
To develop a convolutional neural networks (CNNs) based analysis pipeline for the classification of prostate MRI image quality.
Study Type
Retrospective.
Subjects
Three hundred sixteen prostate mpMRI scans and 312 men (median age 67).
Field Strength/Sequence
A 3 T; fast spin echo T2WI, echo planar imaging DWI, ADC, gradient‐echo dynamic contrast enhanced (DCE).
Assessment
MRI scans were reviewed by three genitourinary radiologists (V.P., M.D.M., S.C.) with 21, 12, and 5 years of experience, respectively. Sequences were labeled as high quality (Q1) or low quality (Q0) and used as the reference standard for all analyses.
Statistical Tests
Sequences were split into training, validation, and testing sets (869, 250, and 120 sequences, respectively). Inter‐reader agreement was assessed with the Fleiss kappa. Following preprocessing and data augmentation, 28 CNNs were trained on MRI slices for each sequence. Model performance was assessed on both a per‐slice and a per‐sequence basis. A pairwise t‐test was performed to compare performances of the classifiers.
Results
The number of sequences labeled as Q0 or Q1 was 38 vs. 278 for T2WI, 43 vs. 273 for DWI, 41 vs. 275 for ADC, and 38 vs. 253 for DCE. Inter‐reader agreement was almost perfect for T2WI and DCE and substantial for DWI and ADC. On the per‐slice analysis, accuracy was 89.95% ± 0.02% for T2WI, 79.83% ± 0.04% for DWI, 76.64% ± 0.04% for ADC, 96.62% ± 0.01% for DCE. On the per‐sequence analysis, accuracy was 100% ± 0.00% for T2WI, DWI, and DCE, and 92.31% ± 0.00% for ADC. The three best algorithms performed significantly better than the remaining ones on every sequence (P‐value |
---|---|
ISSN: | 1053-1807 1522-2586 |
DOI: | 10.1002/jmri.27879 |