Loading…

Building Scalable Video Understanding Benchmarks through Sports

Existing benchmarks for evaluating long video understanding falls short on two critical aspects, either lacking in scale or quality of annotations. These limitations arise from the difficulty in collecting dense annotations for long videos, which often require manually labeling each frame. In this w...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2023-03
Main Authors:	Agarwal, Aniket, Zhang, Alex, Narasimhan, Karthik, Gilitschenski, Igor, Murahari, Vishvak, Kant, Yash
Format:	Article
Language:	English
Subjects:	Annotations Benchmarks Cost analysis Football Frames per second Sports Video data
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Agarwal, Aniket Zhang, Alex Narasimhan, Karthik Gilitschenski, Igor Murahari, Vishvak Kant, Yash
description	Existing benchmarks for evaluating long video understanding falls short on two critical aspects, either lacking in scale or quality of annotations. These limitations arise from the difficulty in collecting dense annotations for long videos, which often require manually labeling each frame. In this work, we introduce an automated Annotation and Video Stream Alignment Pipeline (abbreviated ASAP). We demonstrate the generality of ASAP by aligning unlabeled videos of four different sports with corresponding freely available dense web annotations (i.e. commentary). We then leverage ASAP scalability to create LCric, a large-scale long video understanding benchmark, with over 1000 hours of densely annotated long Cricket videos (with an average sample length of ~50 mins) collected at virtually zero annotation cost. We benchmark and analyze state-of-the-art video understanding models on LCric through a large set of compositional multi-choice and regression queries. We establish a human baseline that indicates significant room for new research to explore. Our human studies indicate that ASAP can align videos and annotations with high fidelity, precision, and speed. The dataset along with the code for ASAP and baselines can be accessed here: https://asap-benchmark.github.io/.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2766578855</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2766578855</sourcerecordid><originalsourceid>FETCH-proquest_journals_27665788553</originalsourceid><addsrcrecordid>eNqNjUELwiAYQCUIGrX_IHQeLJfTW7Aouq-6DrfZdJkuP_3_jegHdHqH9-AtUEKKYpfxPSErlAKMeZ6TkhFKiwQdqqhNr-2A604Y0RqJ77qXDt9sLz0EYb-ykrZTL-GfgIPyLg4K15PzATZo-RAGZPrjGm3Pp-vxkk3evaOE0IwuejurhrCypIzz-ftf9QETBDkW</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2766578855</pqid></control><display><type>article</type><title>Building Scalable Video Understanding Benchmarks through Sports</title><source>Publicly Available Content Database</source><creator>Agarwal, Aniket ; Zhang, Alex ; Narasimhan, Karthik ; Gilitschenski, Igor ; Murahari, Vishvak ; Kant, Yash</creator><creatorcontrib>Agarwal, Aniket ; Zhang, Alex ; Narasimhan, Karthik ; Gilitschenski, Igor ; Murahari, Vishvak ; Kant, Yash</creatorcontrib><description>Existing benchmarks for evaluating long video understanding falls short on two critical aspects, either lacking in scale or quality of annotations. These limitations arise from the difficulty in collecting dense annotations for long videos, which often require manually labeling each frame. In this work, we introduce an automated Annotation and Video Stream Alignment Pipeline (abbreviated ASAP). We demonstrate the generality of ASAP by aligning unlabeled videos of four different sports with corresponding freely available dense web annotations (i.e. commentary). We then leverage ASAP scalability to create LCric, a large-scale long video understanding benchmark, with over 1000 hours of densely annotated long Cricket videos (with an average sample length of ~50 mins) collected at virtually zero annotation cost. We benchmark and analyze state-of-the-art video understanding models on LCric through a large set of compositional multi-choice and regression queries. We establish a human baseline that indicates significant room for new research to explore. Our human studies indicate that ASAP can align videos and annotations with high fidelity, precision, and speed. The dataset along with the code for ASAP and baselines can be accessed here: https://asap-benchmark.github.io/.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Annotations ; Benchmarks ; Cost analysis ; Football ; Frames per second ; Sports ; Video data</subject><ispartof>arXiv.org, 2023-03</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2766578855?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Agarwal, Aniket</creatorcontrib><creatorcontrib>Zhang, Alex</creatorcontrib><creatorcontrib>Narasimhan, Karthik</creatorcontrib><creatorcontrib>Gilitschenski, Igor</creatorcontrib><creatorcontrib>Murahari, Vishvak</creatorcontrib><creatorcontrib>Kant, Yash</creatorcontrib><title>Building Scalable Video Understanding Benchmarks through Sports</title><title>arXiv.org</title><description>Existing benchmarks for evaluating long video understanding falls short on two critical aspects, either lacking in scale or quality of annotations. These limitations arise from the difficulty in collecting dense annotations for long videos, which often require manually labeling each frame. In this work, we introduce an automated Annotation and Video Stream Alignment Pipeline (abbreviated ASAP). We demonstrate the generality of ASAP by aligning unlabeled videos of four different sports with corresponding freely available dense web annotations (i.e. commentary). We then leverage ASAP scalability to create LCric, a large-scale long video understanding benchmark, with over 1000 hours of densely annotated long Cricket videos (with an average sample length of ~50 mins) collected at virtually zero annotation cost. We benchmark and analyze state-of-the-art video understanding models on LCric through a large set of compositional multi-choice and regression queries. We establish a human baseline that indicates significant room for new research to explore. Our human studies indicate that ASAP can align videos and annotations with high fidelity, precision, and speed. The dataset along with the code for ASAP and baselines can be accessed here: https://asap-benchmark.github.io/.</description><subject>Annotations</subject><subject>Benchmarks</subject><subject>Cost analysis</subject><subject>Football</subject><subject>Frames per second</subject><subject>Sports</subject><subject>Video data</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNjUELwiAYQCUIGrX_IHQeLJfTW7Aouq-6DrfZdJkuP_3_jegHdHqH9-AtUEKKYpfxPSErlAKMeZ6TkhFKiwQdqqhNr-2A604Y0RqJ77qXDt9sLz0EYb-ykrZTL-GfgIPyLg4K15PzATZo-RAGZPrjGm3Pp-vxkk3evaOE0IwuejurhrCypIzz-ftf9QETBDkW</recordid><startdate>20230326</startdate><enddate>20230326</enddate><creator>Agarwal, Aniket</creator><creator>Zhang, Alex</creator><creator>Narasimhan, Karthik</creator><creator>Gilitschenski, Igor</creator><creator>Murahari, Vishvak</creator><creator>Kant, Yash</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20230326</creationdate><title>Building Scalable Video Understanding Benchmarks through Sports</title><author>Agarwal, Aniket ; Zhang, Alex ; Narasimhan, Karthik ; Gilitschenski, Igor ; Murahari, Vishvak ; Kant, Yash</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_27665788553</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Annotations</topic><topic>Benchmarks</topic><topic>Cost analysis</topic><topic>Football</topic><topic>Frames per second</topic><topic>Sports</topic><topic>Video data</topic><toplevel>online_resources</toplevel><creatorcontrib>Agarwal, Aniket</creatorcontrib><creatorcontrib>Zhang, Alex</creatorcontrib><creatorcontrib>Narasimhan, Karthik</creatorcontrib><creatorcontrib>Gilitschenski, Igor</creatorcontrib><creatorcontrib>Murahari, Vishvak</creatorcontrib><creatorcontrib>Kant, Yash</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Agarwal, Aniket</au><au>Zhang, Alex</au><au>Narasimhan, Karthik</au><au>Gilitschenski, Igor</au><au>Murahari, Vishvak</au><au>Kant, Yash</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Building Scalable Video Understanding Benchmarks through Sports</atitle><jtitle>arXiv.org</jtitle><date>2023-03-26</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Existing benchmarks for evaluating long video understanding falls short on two critical aspects, either lacking in scale or quality of annotations. These limitations arise from the difficulty in collecting dense annotations for long videos, which often require manually labeling each frame. In this work, we introduce an automated Annotation and Video Stream Alignment Pipeline (abbreviated ASAP). We demonstrate the generality of ASAP by aligning unlabeled videos of four different sports with corresponding freely available dense web annotations (i.e. commentary). We then leverage ASAP scalability to create LCric, a large-scale long video understanding benchmark, with over 1000 hours of densely annotated long Cricket videos (with an average sample length of ~50 mins) collected at virtually zero annotation cost. We benchmark and analyze state-of-the-art video understanding models on LCric through a large set of compositional multi-choice and regression queries. We establish a human baseline that indicates significant room for new research to explore. Our human studies indicate that ASAP can align videos and annotations with high fidelity, precision, and speed. The dataset along with the code for ASAP and baselines can be accessed here: https://asap-benchmark.github.io/.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2023-03
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2766578855
source	Publicly Available Content Database
subjects	Annotations Benchmarks Cost analysis Football Frames per second Sports Video data
title	Building Scalable Video Understanding Benchmarks through Sports
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T14%3A17%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Building%20Scalable%20Video%20Understanding%20Benchmarks%20through%20Sports&rft.jtitle=arXiv.org&rft.au=Agarwal,%20Aniket&rft.date=2023-03-26&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2766578855%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_27665788553%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2766578855&rft_id=info:pmid/&rfr_iscdi=true