Multiscale Vision Transformers

We present Multiscale Vision Transformers (MViT) for video and image recognition, by connecting the seminal idea of multiscale feature hierarchies with transformer models. Multiscale Transformers have several channel-resolution scale stages. Starting from the input resolution and a small channel dim...

Full description

Saved in:

Bibliographic Details
Main Authors:	Fan, Haoqi, Xiong, Bo, Mangalam, Karttikeya, Li, Yanghao, Yan, Zhicheng, Malik, Jitendra, Feichtenhofer, Christoph
Format:	Conference Proceeding
Language:	English
Subjects:	Action and behavior recognition Codes Complexity theory Computational modeling Computer vision Image recognition Recognition and classification Transformers Video analysis and understanding Visualization
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Staff View