Loading…

A Comparative Analysis of Vision Transformers and Convolutional Neural Networks in Cardiac Image Segmentation

In recent years, Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have emerged as dominant automated cardiac image segmentation methods. CNNs are efficient architectures that capture local spatial patterns, whereas ViTs can model long-range global dependencies. Each network has be...

Full description

Saved in:
Bibliographic Details
Main Authors: Granizo, Sebastion, Baldeon-Calisto, Maria, Iniguez, Milena, Navarrete, Danny, Riofrio, Daniel, Perez-Perez, Noel, Benitez, Diego, Flores-Moyano, Ricardo
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In recent years, Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have emerged as dominant automated cardiac image segmentation methods. CNNs are efficient architectures that capture local spatial patterns, whereas ViTs can model long-range global dependencies. Each network has been shown to provide better performance on certain types of tasks and datasets. In this work, we conducted a comparative analysis between ViTs and CNNs in the context of cardiac image segmentation. We statistically evaluated the performance of five CNNs and ViTs architectures using the publicly available Automated Cardiac Diagnosis Challenge (ACDC) MRI dataset. Employing a one-way ANOVA and Tukey is test, our analysis indicates that CNNs exhibit superior performance compared to Transformers in segmenting the right ventricle cavity, the left ventricle cavity, and the left ventricle myocardium. Furthermore, CNN architectures tend to be smaller and easier to train. Among all the networks considered, LinkN et achieves the highest performance with a mean dice of 0.8965 and a mean ASSD of 0.2960.
ISSN:2768-1831
DOI:10.1109/ISDFS60797.2024.10527254