Loading…
Deep-learning-based segmentation of the vocal tract and articulators in real-time magnetic resonance images of speech
•A method to segment the vocal tract and articulators in MR images was developed.•Median accuracy: Dice coefficient of 0.92; general Hausdorff distance of 5mm.•Developed to facilitate quantitative analysis of the vocal tract and articulators.•Intended for use in clinical and non-clinical studies of...
Saved in:
Published in: | Computer methods and programs in biomedicine 2021-01, Vol.198, p.105814-105814, Article 105814 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •A method to segment the vocal tract and articulators in MR images was developed.•Median accuracy: Dice coefficient of 0.92; general Hausdorff distance of 5mm.•Developed to facilitate quantitative analysis of the vocal tract and articulators.•Intended for use in clinical and non-clinical studies of speech.•A novel clinically relevant segmentation accuracy metric was also developed.
Magnetic resonance (MR) imaging is increasingly used in studies of speech as it enables non-invasive visualisation of the vocal tract and articulators, thus providing information about their shape, size, motion and position. Extraction of this information for quantitative analysis is achieved using segmentation. Methods have been developed to segment the vocal tract, however, none of these also fully segment any articulators. The objective of this work was to develop a method to fully segment multiple groups of articulators as well as the vocal tract in two-dimensional MR images of speech, thus overcoming the limitations of existing methods.
Five speech MR image sets (392 MR images in total), each of a different healthy adult volunteer, were used in this work. A fully convolutional network with an architecture similar to the original U-Net was developed to segment the following six regions in the image sets: the head, soft palate, jaw, tongue, vocal tract and tooth space. A five-fold cross-validation was performed to investigate the segmentation accuracy and generalisability of the network. The segmentation accuracy was assessed using standard overlap-based metrics (Dice coefficient and general Hausdorff distance) and a novel clinically relevant metric based on velopharyngeal closure.
The segmentations created by the method had a median Dice coefficient of 0.92 and a median general Hausdorff distance of 5mm. The method segmented the head most accurately (median Dice coefficient of 0.99), and the soft palate and tooth space least accurately (median Dice coefficients of 0.92 and 0.93 respectively). The segmentations created by the method correctly showed 90% (27 out of 30) of the velopharyngeal closures in the MR image sets.
An automatic method to fully segment multiple groups of articulators as well as the vocal tract in two-dimensional MR images of speech was successfully developed. The method is intended for use in clinical and non-clinical speech studies which involve quantitative analysis of the shape, size, motion and position of the vocal tract and articulators. In addition |
---|---|
ISSN: | 0169-2607 1872-7565 |
DOI: | 10.1016/j.cmpb.2020.105814 |