Loading…

BootsTAP: Bootstrapped Training for Tracking-Any-Point

To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform in real scenes. This can be formalized as Tracking-Any-Point (TAP), which requires the algorithm to track any point on solid surfaces in a video, potentially d...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2024-05
Main Authors:	Doersch, Carl, Luc, Pauline, Yang, Yi, Gokay, Dilara, Koppula, Skanda, Gupta, Ankush, Heyward, Joseph, Rocco, Ignacio, Ross Goroshin, Carreira, João, Zisserman, Andrew
Format:	Article
Language:	English
Subjects:	Algorithms Solid surfaces Tracking Training
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Doersch, Carl Luc, Pauline Yang, Yi Gokay, Dilara Koppula, Skanda Gupta, Ankush Heyward, Joseph Rocco, Ignacio Ross Goroshin Carreira, João Zisserman, Andrew
description	To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform in real scenes. This can be formalized as Tracking-Any-Point (TAP), which requires the algorithm to track any point on solid surfaces in a video, potentially densely in space and time. Large-scale groundtruth training data for TAP is only available in simulation, which currently has a limited variety of objects and motion. In this work, we demonstrate how large-scale, unlabeled, uncurated real-world data can improve a TAP model with minimal architectural changes, using a selfsupervised student-teacher setup. We demonstrate state-of-the-art performance on the TAP-Vid benchmark surpassing previous results by a wide margin: for example, TAP-Vid-DAVIS performance improves from 61.3% to 67.4%, and TAP-Vid-Kinetics from 57.2% to 62.5%. For visualizations, see our project webpage at https://bootstap.github.io/
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2921322577</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2921322577</sourcerecordid><originalsourceid>FETCH-proquest_journals_29213225773</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mQwc8rPLykOcQywUgCzSooSCwpSUxRCihIz8zLz0hXS8otAnORsIEfXMa9SNyA_M6-Eh4E1LTGnOJUXSnMzKLu5hjh76BYU5ReWphaXxGfllxblAaXijSyNDI2NjEzNzY2JUwUAmgQ06g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2921322577</pqid></control><display><type>article</type><title>BootsTAP: Bootstrapped Training for Tracking-Any-Point</title><source>Publicly Available Content (ProQuest)</source><creator>Doersch, Carl ; Luc, Pauline ; Yang, Yi ; Gokay, Dilara ; Koppula, Skanda ; Gupta, Ankush ; Heyward, Joseph ; Rocco, Ignacio ; Ross Goroshin ; Carreira, João ; Zisserman, Andrew</creator><creatorcontrib>Doersch, Carl ; Luc, Pauline ; Yang, Yi ; Gokay, Dilara ; Koppula, Skanda ; Gupta, Ankush ; Heyward, Joseph ; Rocco, Ignacio ; Ross Goroshin ; Carreira, João ; Zisserman, Andrew</creatorcontrib><description>To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform in real scenes. This can be formalized as Tracking-Any-Point (TAP), which requires the algorithm to track any point on solid surfaces in a video, potentially densely in space and time. Large-scale groundtruth training data for TAP is only available in simulation, which currently has a limited variety of objects and motion. In this work, we demonstrate how large-scale, unlabeled, uncurated real-world data can improve a TAP model with minimal architectural changes, using a selfsupervised student-teacher setup. We demonstrate state-of-the-art performance on the TAP-Vid benchmark surpassing previous results by a wide margin: for example, TAP-Vid-DAVIS performance improves from 61.3% to 67.4%, and TAP-Vid-Kinetics from 57.2% to 62.5%. For visualizations, see our project webpage at https://bootstap.github.io/</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Solid surfaces ; Tracking ; Training</subject><ispartof>arXiv.org, 2024-05</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2921322577?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>778,782,25740,36999,44577</link.rule.ids></links><search><creatorcontrib>Doersch, Carl</creatorcontrib><creatorcontrib>Luc, Pauline</creatorcontrib><creatorcontrib>Yang, Yi</creatorcontrib><creatorcontrib>Gokay, Dilara</creatorcontrib><creatorcontrib>Koppula, Skanda</creatorcontrib><creatorcontrib>Gupta, Ankush</creatorcontrib><creatorcontrib>Heyward, Joseph</creatorcontrib><creatorcontrib>Rocco, Ignacio</creatorcontrib><creatorcontrib>Ross Goroshin</creatorcontrib><creatorcontrib>Carreira, João</creatorcontrib><creatorcontrib>Zisserman, Andrew</creatorcontrib><title>BootsTAP: Bootstrapped Training for Tracking-Any-Point</title><title>arXiv.org</title><description>To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform in real scenes. This can be formalized as Tracking-Any-Point (TAP), which requires the algorithm to track any point on solid surfaces in a video, potentially densely in space and time. Large-scale groundtruth training data for TAP is only available in simulation, which currently has a limited variety of objects and motion. In this work, we demonstrate how large-scale, unlabeled, uncurated real-world data can improve a TAP model with minimal architectural changes, using a selfsupervised student-teacher setup. We demonstrate state-of-the-art performance on the TAP-Vid benchmark surpassing previous results by a wide margin: for example, TAP-Vid-DAVIS performance improves from 61.3% to 67.4%, and TAP-Vid-Kinetics from 57.2% to 62.5%. For visualizations, see our project webpage at https://bootstap.github.io/</description><subject>Algorithms</subject><subject>Solid surfaces</subject><subject>Tracking</subject><subject>Training</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mQwc8rPLykOcQywUgCzSooSCwpSUxRCihIz8zLz0hXS8otAnORsIEfXMa9SNyA_M6-Eh4E1LTGnOJUXSnMzKLu5hjh76BYU5ReWphaXxGfllxblAaXijSyNDI2NjEzNzY2JUwUAmgQ06g</recordid><startdate>20240523</startdate><enddate>20240523</enddate><creator>Doersch, Carl</creator><creator>Luc, Pauline</creator><creator>Yang, Yi</creator><creator>Gokay, Dilara</creator><creator>Koppula, Skanda</creator><creator>Gupta, Ankush</creator><creator>Heyward, Joseph</creator><creator>Rocco, Ignacio</creator><creator>Ross Goroshin</creator><creator>Carreira, João</creator><creator>Zisserman, Andrew</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240523</creationdate><title>BootsTAP: Bootstrapped Training for Tracking-Any-Point</title><author>Doersch, Carl ; Luc, Pauline ; Yang, Yi ; Gokay, Dilara ; Koppula, Skanda ; Gupta, Ankush ; Heyward, Joseph ; Rocco, Ignacio ; Ross Goroshin ; Carreira, João ; Zisserman, Andrew</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29213225773</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Solid surfaces</topic><topic>Tracking</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Doersch, Carl</creatorcontrib><creatorcontrib>Luc, Pauline</creatorcontrib><creatorcontrib>Yang, Yi</creatorcontrib><creatorcontrib>Gokay, Dilara</creatorcontrib><creatorcontrib>Koppula, Skanda</creatorcontrib><creatorcontrib>Gupta, Ankush</creatorcontrib><creatorcontrib>Heyward, Joseph</creatorcontrib><creatorcontrib>Rocco, Ignacio</creatorcontrib><creatorcontrib>Ross Goroshin</creatorcontrib><creatorcontrib>Carreira, João</creatorcontrib><creatorcontrib>Zisserman, Andrew</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Doersch, Carl</au><au>Luc, Pauline</au><au>Yang, Yi</au><au>Gokay, Dilara</au><au>Koppula, Skanda</au><au>Gupta, Ankush</au><au>Heyward, Joseph</au><au>Rocco, Ignacio</au><au>Ross Goroshin</au><au>Carreira, João</au><au>Zisserman, Andrew</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>BootsTAP: Bootstrapped Training for Tracking-Any-Point</atitle><jtitle>arXiv.org</jtitle><date>2024-05-23</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform in real scenes. This can be formalized as Tracking-Any-Point (TAP), which requires the algorithm to track any point on solid surfaces in a video, potentially densely in space and time. Large-scale groundtruth training data for TAP is only available in simulation, which currently has a limited variety of objects and motion. In this work, we demonstrate how large-scale, unlabeled, uncurated real-world data can improve a TAP model with minimal architectural changes, using a selfsupervised student-teacher setup. We demonstrate state-of-the-art performance on the TAP-Vid benchmark surpassing previous results by a wide margin: for example, TAP-Vid-DAVIS performance improves from 61.3% to 67.4%, and TAP-Vid-Kinetics from 57.2% to 62.5%. For visualizations, see our project webpage at https://bootstap.github.io/</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-05
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2921322577
source	Publicly Available Content (ProQuest)
subjects	Algorithms Solid surfaces Tracking Training
title	BootsTAP: Bootstrapped Training for Tracking-Any-Point
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T01%3A20%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=BootsTAP:%20Bootstrapped%20Training%20for%20Tracking-Any-Point&rft.jtitle=arXiv.org&rft.au=Doersch,%20Carl&rft.date=2024-05-23&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2921322577%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_29213225773%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2921322577&rft_id=info:pmid/&rfr_iscdi=true