Loading…

Pure Versus Hybrid Transformers For Multi-Modal Brain Tumor Segmentation: A Comparative Study

Vision Transformers (ViT)-based models are witnessing an exponential growth in the medical imaging community. Among desirable properties, ViTs provide a powerful modeling of long-range pixel relationships, contrary to inherently local convolutional neural networks (CNN). These emerging models can be...

Full description

Saved in:
Bibliographic Details
Main Authors: Andrade-Miranda, G., Jaouen, V., Bourbonne, V., Lucia, F., Visvikis, D., Conze, P.-H.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Vision Transformers (ViT)-based models are witnessing an exponential growth in the medical imaging community. Among desirable properties, ViTs provide a powerful modeling of long-range pixel relationships, contrary to inherently local convolutional neural networks (CNN). These emerging models can be categorized either as hybrid-based when used in conjunction with CNN layers (CNN-ViT) or purely Transformers-based. In this work, we conduct a comparative quantitative analysis to study the differences between a range of available Transformers-based models using controlled brain tumor segmentation experiments. We also investigate to what extent such models could benefit from modality interaction schemes in a multi-modal setting. Results on the publicly-available BraTS2021 dataset show that hybrid-based pipelines generally tend to outperform simple Transformers-based models. In these experiments, no particular improvement using multi-modal interaction schemes was observed.
ISSN:2381-8549
DOI:10.1109/ICIP46576.2022.9897658