Multiscale Vision Transformers

We present Multiscale Vision Transformers (MViT) for video and image recognition, by connecting the seminal idea of multiscale feature hierarchies with transformer models. Multiscale Transformers have several channel-resolution scale stages. Starting from the input resolution and a small channel dim...

Full description

Saved in:
Bibliographic Details
Main Authors: Fan, Haoqi, Xiong, Bo, Mangalam, Karttikeya, Li, Yanghao, Yan, Zhicheng, Malik, Jitendra, Feichtenhofer, Christoph
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!