Loading…

Enhancing representation in radiography-reports foundation model: a granular alignment algorithm using masked contrastive learning

Recently, multi-modal vision-language foundation models have gained significant attention in the medical field. While these models offer great opportunities, they still face crucial challenges, such as the requirement for fine-grained knowledge understanding in computer-aided diagnosis and the capab...

Full description

Saved in:
Bibliographic Details
Published in:Nature communications 2024-09, Vol.15 (1), p.7620-12, Article 7620
Main Authors: Huang, Weijian, Li, Cheng, Zhou, Hong-Yu, Yang, Hao, Liu, Jiarun, Liang, Yong, Zheng, Hairong, Zhang, Shaoting, Wang, Shanshan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Recently, multi-modal vision-language foundation models have gained significant attention in the medical field. While these models offer great opportunities, they still face crucial challenges, such as the requirement for fine-grained knowledge understanding in computer-aided diagnosis and the capability of utilizing very limited or even no task-specific labeled data in real-world clinical applications. In this study, we present MaCo, a masked contrastive chest X-ray foundation model that tackles these challenges. MaCo explores masked contrastive learning to simultaneously achieve fine-grained image understanding and zero-shot learning for a variety of medical imaging tasks. It designs a correlation weighting mechanism to adjust the correlation between masked chest X-ray image patches and their corresponding reports, thereby enhancing the model’s representation learning capabilities. To evaluate the performance of MaCo, we conducted extensive experiments using 6 well-known open-source X-ray datasets. The experimental results demonstrate the superiority of MaCo over 10 state-of-the-art approaches across tasks such as classification, segmentation, detection, and phrase grounding. These findings highlight the significant potential of MaCo in advancing a wide range of medical image analysis tasks. Multi-modal foundation models are increasingly important in medical applications. Here, authors show a masked contrastive chest X-ray model that achieves fine-grained image understanding and zero-shot capabilities, outperforming existing methods
ISSN:2041-1723
2041-1723
DOI:10.1038/s41467-024-51749-0