Loading…

Fuzzy-ViT: A Deep Neuro-Fuzzy System for Cross-Domain Transfer Learning from Large-scale General Data to Medical Image

The surge in visual general big data has notably advanced data-driven deep learning-based computer vision technologies. Transformer-based methods shine in this era of big data because of their attention mechanism architecture and demand for massive data. However, the difficulty of obtaining medical...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on fuzzy systems 2024, p.1-12
Main Authors: Li, Qiankun, Wang, Yimou, Zhang, Yani, Zuo, Zhaoyu, Chen, Junxin, Wang, Wei
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The surge in visual general big data has notably advanced data-driven deep learning-based computer vision technologies. Transformer-based methods shine in this era of big data because of their attention mechanism architecture and demand for massive data. However, the difficulty of obtaining medical images has caused the field to continue facing the limited-data challenge. In this paper, we propose a novel deep neuro-fuzzy system named Fuzzy-ViT, which synergistically integrates fuzzy logic with the Vision Transformer (ViT) for cross-domain transfer learning from large-scale general data to medical image domain. Specifically, Fuzzy-ViT utilizes a ViT backbone pre-trained on extensive general datasets such as ImageNet-21K, LAION-400M, and LAION-2B to extract rich general features. Then, a Fuzzy Attention Cross-Domain Module (FACM) is presented to transfer general features to medical features, thereby enhancing the medical image analysis. Thanks to the Fuzzy System Transitioner (FST) in FACM, fuzzy and uninterpretable general domain features can be effectively converted into those needed in the medical domain. In addition, the Attention Mechanism Smoother (AMS) in FACM smoothes the conversion outcomes, ensuring a harmonious integration of the fuzzy system with the neural network architecture. Experimental results demonstrate that the proposed Fuzzy-ViT achieves state-of-the-art and satisfactory performance on popular medical image benchmarks (BreakHis and HCRF) with 93.37% and 97.22% F1 scores. Detailed ablation analysis demonstrates that the effectiveness of our method for bridging large general visual and medical images.
ISSN:1063-6706
1941-0034
DOI:10.1109/TFUZZ.2024.3400861