Loading…

Encoder-decoder CNN models for automatic tracking of tongue contours in real-time ultrasound data

[Display omitted] •A combination of standard and dilated convolution provide sharper segmentation results.•Tongue contour can be automatically tracked in real-time from ultrasound data using deep learning methods.•BowNet models can delineate a region of interest from input data using the capability...

Full description

Saved in:
Bibliographic Details
Published in:Methods (San Diego, Calif.) Calif.), 2020-07, Vol.179, p.26-36
Main Authors: Hamed Mozaffari, M., Lee, Won-Sook
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:[Display omitted] •A combination of standard and dilated convolution provide sharper segmentation results.•Tongue contour can be automatically tracked in real-time from ultrasound data using deep learning methods.•BowNet models can delineate a region of interest from input data using the capability of dilated convolution.•A large number of learnable parameters in a CNN model is not always an indication of a better generalization.•Ground truth labels of ultrasound data should be in binary instead of gray-scale format. One application of medical ultrasound imaging is to visualize and characterize human tongue shape and motion in real-time to study healthy or impaired speech production. Due to the low-contrast characteristic and noisy nature of ultrasound images, it requires knowledge about the tongue structure and ultrasound data interpretation for users to recognize tongue locations and gestures easily. Moreover, quantitative analysis of tongue motion needs the tongue contour to be extracted, tracked and visualized instead of the whole tongue region. Manual tongue contour extraction is a cumbersome, subjective, and error-prone task. Furthermore, it is not a feasible solution for real-time applications where the tongue contour moves rapidly with nuance gestures. This paper presents two new deep neural networks (named BowNet models) that benefit from the ability of global prediction of encoding–decoding fully convolutional neural networks and the capability of full-resolution extraction of dilated convolutions. Both qualitatively and quantitatively studies over datasets from two ultrasound machines disclosed the outstanding performances of the proposed deep learning models in terms of performance speed and robustness. Experimental results also revealed a significant improvement in the accuracy of prediction maps due to the better exploration and exploitation ability of the proposed network models.
ISSN:1046-2023
1095-9130
DOI:10.1016/j.ymeth.2020.05.011