Video Action Transformer Network

We introduce the Action Transformer model for recognizing and localizing human actions in video clips. We repurpose a Transformer-style architecture to aggregate features from the spatiotemporal context around the person whose actions we are trying to classify. We show that by using high-resolution,...

Full description

Saved in:

Bibliographic Details
Main Authors:	Girdhar, Rohit, Joao Carreira, Joao, Doersch, Carl, Zisserman, Andrew
Format:	Conference Proceeding
Language:	English
Subjects:	Action Recognition Context modeling Deep Learning Faces Hands Pattern recognition Semantics Spatiotemporal phenomena Training Transformers Videos Visualization
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Staff View