Multiscale Vision Transformers
We present Multiscale Vision Transformers (MViT) for video and image recognition, by connecting the seminal idea of multiscale feature hierarchies with transformer models. Multiscale Transformers have several channel-resolution scale stages. Starting from the input resolution and a small channel dim...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Conference Proceeding |
| Language: | English |
| Subjects: | |
| Online Access: | Request full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|