Revisiting the "Video" in Video-Language Understanding

What makes a video task uniquely suited for videos, beyond what can be understood from a single image? Building on recent progress in self-supervised image-language models, we revisit this question in the context of video and language tasks. We propose the atemporal probe (ATP), a new model for vide...

Full description

Saved in:

Bibliographic Details
Main Authors:	Buch, Shyamal, Eyzaguirre, Cristobal, Gaidon, Adrien, Wu, Jiajun, Fei-Fei, Li, Niebles, Juan Carlos
Format:	Conference Proceeding
Language:	English
Subjects:	Analytical models Benchmark testing Buildings Image recognition Pattern recognition Question answering (information retrieval) Task analysis Video analysis and understanding Vision + language
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Staff View