Video Action Transformer Network

We introduce the Action Transformer model for recognizing and localizing human actions in video clips. We repurpose a Transformer-style architecture to aggregate features from the spatiotemporal context around the person whose actions we are trying to classify. We show that by using high-resolution,...

Full description

Saved in:
Bibliographic Details
Main Authors: Girdhar, Rohit, Joao Carreira, Joao, Doersch, Carl, Zisserman, Andrew
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!