Loading…
Development of an initial training and evaluation programme for manual lower limb muscle MRI segmentation
Background Magnetic resonance imaging (MRI) quantification of intramuscular fat accumulation is a responsive biomarker in neuromuscular diseases. Despite emergence of automated methods, manual muscle segmentation remains an essential foundation. We aimed to develop a training programme for new obser...
Saved in:
Published in: | European radiology experimental 2024-07, Vol.8 (1), p.85-12, Article 85 |
---|---|
Main Authors: | , , , , , , , , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Background
Magnetic resonance imaging (MRI) quantification of intramuscular fat accumulation is a responsive biomarker in neuromuscular diseases. Despite emergence of automated methods, manual muscle segmentation remains an essential foundation. We aimed to develop a training programme for new observers to demonstrate competence in lower limb muscle segmentation and establish reliability benchmarks for future human observers and machine learning segmentation packages.
Methods
The learning phase of the training programme comprised a training manual, direct instruction, and eight lower limb MRI scans with reference standard large and small regions of interest (ROIs). The assessment phase used test–retest scans from two patients and two healthy controls. Interscan and interobserver reliability metrics were calculated to identify underperforming outliers and to determine competency benchmarks.
Results
Three experienced observers undertook the assessment phase, whilst eight new observers completed the full training programme. Two of the new observers were identified as underperforming outliers, relating to variation in size or consistency of segmentations; six had interscan and interobserver reliability equivalent to those of experienced observers. The calculated benchmark for the Sørensen-Dice similarity coefficient between observers was greater than 0.87 and 0.92 for individual thigh and calf muscles, respectively. Interscan and interobserver reliability were significantly higher for large than small ROIs (all
p
|
---|---|
ISSN: | 2509-9280 2509-9280 |
DOI: | 10.1186/s41747-024-00475-9 |