Loading…
ART: An Efficient Transformer with Atrous Residual Learning for Medical Images
Convolutional neural networks (CNNs) have achieved great success in medical image processing. However, CNNs have been proven to lack the ability to capture global features due to the problem of inherent local receptive field. With the intrinsic long-range global dependency information, transformer n...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Convolutional neural networks (CNNs) have achieved great success in medical image processing. However, CNNs have been proven to lack the ability to capture global features due to the problem of inherent local receptive field. With the intrinsic long-range global dependency information, transformer neural networks have become the alternative architectures of CNNs. The current transformers suffer from heavy parameters and complex training strategies and rely on large-scale datasets and huge computing resources. To address these problems, in this manuscript, we propose the Atrous Residual Transformer (ART), a cascade architecture of four ART stages, which is more lightweight and suitable for medical image processing effectively. An Atrous Projection module is introduced to replace the typical Linear Projection operation to efficiently generate embedding features, and Atrous Transformer layer to learn global information and local features, while greatly reducing the amount of model parameters (Params) and floating-point operations (FLOPs). We conducted rich experiments with ART, demonstrating that the ART model achieves remarkable performances comparing with the state-of-the-art CNNs and mainstream transformers, with lower FLOPs and fewer parameters. Our ART model training on the COVID-CT dataset from scratch, achieves 81% accuracy surpassing ResNet50 by 10%, EfficientNet-B6 by 5% and ViT-B by 9%, DeiT-B by 9% respectively, and it only has 17M Params and 2.9G FLOPs. |
---|---|
ISSN: | 1945-788X |
DOI: | 10.1109/ICME55011.2023.00327 |