Loading…

X-Net: a dual encoding–decoding method in medical image segmentation

Medical image segmentation has the priori guiding significance for clinical diagnosis and treatment. In the past ten years, a large number of experimental facts have proved the great success of deep convolutional neural networks in various medical image segmentation tasks. However, the convolutional...

Full description

Saved in:
Bibliographic Details
Published in:The Visual computer 2023-06, Vol.39 (6), p.2223-2233
Main Authors: Li, Yuanyuan, Wang, Ziyu, Yin, Li, Zhu, Zhiqin, Qi, Guanqiu, Liu, Yu
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Medical image segmentation has the priori guiding significance for clinical diagnosis and treatment. In the past ten years, a large number of experimental facts have proved the great success of deep convolutional neural networks in various medical image segmentation tasks. However, the convolutional networks seem to focus too much on the local image details, while ignoring the long-range dependence. The Transformer structure can encode long-range dependencies in image and learn high-dimensional image information through the self-attention mechanism. But this structure currently depends on the database scale to give full play to its excellent performance, which limits its application in medical images with limited database size. In this paper, the characteristics of CNNs and Transformer are integrated to propose a dual encoding–decoding structure of the X-shaped network (X-Net). It can serve as a good alternative to the traditional pure convolutional medical image segmentation network. In the encoding phase, the local and global features are simultaneously extracted by two types of encoders, convolutional downsampling, and Transformer and then merged through jump connection. In the decoding phase, a variational auto-encoder branch is added to reconstruct the input image itself in order to weaken the impact of insufficient data. Comparative experiments on three medical image datasets show that X-Net can realize the organic combination of Transformer and CNNs.
ISSN:0178-2789
1432-2315
DOI:10.1007/s00371-021-02328-7