Loading…

Deblurring masked image modeling for ultrasound image analysis

Recently, large pretrained vision foundation models based on masked image modeling (MIM) have attracted unprecedented attention and achieved remarkable performance across various tasks. However, the study of MIM for ultrasound imaging remains relatively unexplored, and most importantly, current MIM...

Full description

Saved in:
Bibliographic Details
Published in:Medical image analysis 2024-10, Vol.97, p.103256, Article 103256
Main Authors: Kang, Qingbo, Lao, Qicheng, Gao, Jun, Liu, Jingyan, Yi, Huahui, Ma, Buyun, Zhang, Xiaofan, Li, Kang
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Recently, large pretrained vision foundation models based on masked image modeling (MIM) have attracted unprecedented attention and achieved remarkable performance across various tasks. However, the study of MIM for ultrasound imaging remains relatively unexplored, and most importantly, current MIM approaches fail to account for the gap between natural images and ultrasound, as well as the intrinsic imaging characteristics of the ultrasound modality, such as the high noise-to-signal ratio. In this paper, motivated by the unique high noise-to-signal ratio property in ultrasound, we propose a deblurring MIM approach specialized to ultrasound, which incorporates a deblurring task into the pretraining proxy task. The incorporation of deblurring facilitates the pretraining to better recover the subtle details within ultrasound images that are vital for subsequent downstream analysis. Furthermore, we employ a multi-scale hierarchical encoder to extract both local and global contextual cues for improved performance, especially on pixel-wise tasks such as segmentation. We conduct extensive experiments involving 280,000 ultrasound images for the pretraining and evaluate the downstream transfer performance of the pretrained model on various disease diagnoses (nodule, Hashimoto’s thyroiditis) and task types (classification, segmentation). The experimental results demonstrate the efficacy of the proposed deblurring MIM, achieving state-of-the-art performance across a wide range of downstream tasks and datasets. Overall, our work highlights the potential of deblurring MIM for ultrasound image analysis, presenting an ultrasound-specific vision foundation model. •We propose a specialized deblurring masked image modeling (MIM) approach tailored for ultrasound image analysis by incorporating a deblurring task into the pretraining proxy task.•We utilize a multi-scale hierarchical encoder architecture, enabling the extraction of both fine- and coarse-grained image representations.•We conduct pretraining experiments on 280,000 thyroid ultrasound images, and the downstream tasks encompass various disease diagnoses (nodule and Hashimoto’s thyroiditis), as well as different types of tasks (classification and segmentation).•As the first MIM-based work for ultrasound, we delve into fundamental considerations for modality-specific foundation models, yielding significant conclusions.
ISSN:1361-8415
1361-8423
1361-8423
DOI:10.1016/j.media.2024.103256