Loading…

Vision transformers for dense prediction: A survey

Transformers have demonstrated impressive expressiveness and transfer capability in computer vision fields. Dense prediction is a fundamental problem in computer vision that is more challenging to solve than general image-level prediction tasks. The inherent properties of transformers enable them to...

Full description

Saved in:
Bibliographic Details
Published in:Knowledge-based systems 2022-10, Vol.253, p.109552, Article 109552
Main Authors: Zuo, Shuangquan, Xiao, Yun, Chang, Xiaojun, Wang, Xuanhong
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Transformers have demonstrated impressive expressiveness and transfer capability in computer vision fields. Dense prediction is a fundamental problem in computer vision that is more challenging to solve than general image-level prediction tasks. The inherent properties of transformers enable them to process feature representations with stable and relatively high resolution, which precisely satisfies the demands of dense prediction tasks for finer-grained and more globally coherent predictions. Furthermore, compared to convolutional networks, transformer methods require minimal inductive bias and permit long-range information interaction. These strengths have contributed to exciting advancements in dense prediction tasks that apply transformer networks. This survey aims to provide a comprehensive overview of transformer models with a specific focus on dense prediction. In this survey, we provide a well-rounded view of state-of-the-art transformer-based approaches, explicitly emphasizing pixel-level prediction tasks. We generally consider transformer variants from the network architecture perspective. We further propose a novel taxonomy to organize these models according to their constructions. Subsequently, we examine various specific optimization strategies to tackle certain bottleneck problems in dense prediction tasks. We explore the commonalities and differences among these works and provide multiple horizontal comparisons from the experimental point of view. Finally, we summarize several stubborn problems that continue to impact visual transformers and outline some possible development directions. •We provide a comprehensive review of state-of-the-art transformer methods.•We focus on the transformer-based methods in the area of dense prediction tasks.•We propose a model taxonomy according to architectures and optimizations.•We conduct a systematic horizontal comparison of multitudinous methods.
ISSN:0950-7051
1872-7409
DOI:10.1016/j.knosys.2022.109552