Revisiting the "Video" in Video-Language Understanding

What makes a video task uniquely suited for videos, beyond what can be understood from a single image? Building on recent progress in self-supervised image-language models, we revisit this question in the context of video and language tasks. We propose the atemporal probe (ATP), a new model for vide...

Full description

Saved in:
Bibliographic Details
Main Authors: Buch, Shyamal, Eyzaguirre, Cristobal, Gaidon, Adrien, Wu, Jiajun, Fei-Fei, Li, Niebles, Juan Carlos
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!