Loading…

Autoregressive Queries for Adaptive Tracking with Spatio-Temporal Transformers

The rich spatio-temporal information is crucial to capture the complicated target appearance variations in visual tracking. However, most top-performing tracking algorithms rely on many hand-crafted components for spatio-temporal information aggregation. Consequently, the spatio-temporal information...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xie, Jinxia, Zhong, Bineng, Mo, Zhiyi, Zhang, Shengping, Shi, Liangtao, Song, Shuxiang, Ji, Rongrong
Format:	Conference Proceeding
Language:	English
Subjects:	Adaptation models Computational modeling Computer vision Target tracking Transformers Visualization
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	19309
container_issue
container_start_page	19300
container_title
container_volume
creator	Xie, Jinxia Zhong, Bineng Mo, Zhiyi Zhang, Shengping Shi, Liangtao Song, Shuxiang Ji, Rongrong
description	The rich spatio-temporal information is crucial to capture the complicated target appearance variations in visual tracking. However, most top-performing tracking algorithms rely on many hand-crafted components for spatio-temporal information aggregation. Consequently, the spatio-temporal information is far away from being fully explored. To alleviate this issue, we propose an adaptive tracker with spatio-temporal transformers (named AQA-Track), which adopts simple autoregressive queries to effectively learn spatio-temporal information without many hand-designed components. Firstly, we introduce a set of learnable and autoregressive queries to capture the instantaneous target appearance changes in a sliding window fashion. Then, we design a novel attention mechanism for the interaction of existing queries to generate a new query in current frame. Finally, based on the initial target template and learnt autoregressive queries, a spatio-temporal information fusion module (STM) is designed for spatiotemporal formation aggregation to locate a target object. Benefiting from the STM, we can effectively combine the static appearance and instantaneous changes to guide robust tracking. Extensive experiments show that our method significantly improves the tracker's performance on six popular tracking benchmarks: LaSOT, LaSOT ext , TrackingNet, GOT-10k, TNL2K, and UAV123. Code and models will be https://github.com/orgs/GXNU-ZhongLab.
doi_str_mv	10.1109/CVPR52733.2024.01826
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10656727</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10656727</ieee_id><sourcerecordid>10656727</sourcerecordid><originalsourceid>FETCH-LOGICAL-i676-ef379b33c31d677f993cfe474af81345f6f8bdb3fd08c6368224399659dac62a3</originalsourceid><addsrcrecordid>eNotkM1KAzEUhaMgWLRv0MW8wNSb3MlNsizFPyj-Dm5LOnNTo21nSKaKb2-Lrg6c7_AtjhATCVMpwV3N355etDKIUwWqmoK0ik7E2BlnUQNqBKBTMVLa6NKA0edinPMHAKCSkpwdiYfZfugSrxPnHL-4eN5zipyL0KVi1vp-OJZ18s1n3K2L7zi8F6-9H2JX1rztu-Q3R7rLh_2WU74UZ8FvMo__80LUN9f1_K5cPN7ez2eLMpKhkgMat0JsULZkTHAOm8CVqXywEisdKNhVu8LQgm0IySpVoXOkXesbUh4vxORPG5l52ae49elnKYE0mcMdv59PUGc</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Autoregressive Queries for Adaptive Tracking with Spatio-Temporal Transformers</title><source>IEEE Xplore All Conference Series</source><creator>Xie, Jinxia ; Zhong, Bineng ; Mo, Zhiyi ; Zhang, Shengping ; Shi, Liangtao ; Song, Shuxiang ; Ji, Rongrong</creator><creatorcontrib>Xie, Jinxia ; Zhong, Bineng ; Mo, Zhiyi ; Zhang, Shengping ; Shi, Liangtao ; Song, Shuxiang ; Ji, Rongrong</creatorcontrib><description>The rich spatio-temporal information is crucial to capture the complicated target appearance variations in visual tracking. However, most top-performing tracking algorithms rely on many hand-crafted components for spatio-temporal information aggregation. Consequently, the spatio-temporal information is far away from being fully explored. To alleviate this issue, we propose an adaptive tracker with spatio-temporal transformers (named AQA-Track), which adopts simple autoregressive queries to effectively learn spatio-temporal information without many hand-designed components. Firstly, we introduce a set of learnable and autoregressive queries to capture the instantaneous target appearance changes in a sliding window fashion. Then, we design a novel attention mechanism for the interaction of existing queries to generate a new query in current frame. Finally, based on the initial target template and learnt autoregressive queries, a spatio-temporal information fusion module (STM) is designed for spatiotemporal formation aggregation to locate a target object. Benefiting from the STM, we can effectively combine the static appearance and instantaneous changes to guide robust tracking. Extensive experiments show that our method significantly improves the tracker's performance on six popular tracking benchmarks: LaSOT, LaSOT ext , TrackingNet, GOT-10k, TNL2K, and UAV123. Code and models will be https://github.com/orgs/GXNU-ZhongLab.</description><identifier>EISSN: 2575-7075</identifier><identifier>EISBN: 9798350353006</identifier><identifier>DOI: 10.1109/CVPR52733.2024.01826</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Adaptation models ; Computational modeling ; Computer vision ; Target tracking ; Transformers ; Visualization</subject><ispartof>2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, p.19300-19309</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10656727$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,27904,54533,54910</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10656727$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Xie, Jinxia</creatorcontrib><creatorcontrib>Zhong, Bineng</creatorcontrib><creatorcontrib>Mo, Zhiyi</creatorcontrib><creatorcontrib>Zhang, Shengping</creatorcontrib><creatorcontrib>Shi, Liangtao</creatorcontrib><creatorcontrib>Song, Shuxiang</creatorcontrib><creatorcontrib>Ji, Rongrong</creatorcontrib><title>Autoregressive Queries for Adaptive Tracking with Spatio-Temporal Transformers</title><title>2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</title><addtitle>CVPR</addtitle><description>The rich spatio-temporal information is crucial to capture the complicated target appearance variations in visual tracking. However, most top-performing tracking algorithms rely on many hand-crafted components for spatio-temporal information aggregation. Consequently, the spatio-temporal information is far away from being fully explored. To alleviate this issue, we propose an adaptive tracker with spatio-temporal transformers (named AQA-Track), which adopts simple autoregressive queries to effectively learn spatio-temporal information without many hand-designed components. Firstly, we introduce a set of learnable and autoregressive queries to capture the instantaneous target appearance changes in a sliding window fashion. Then, we design a novel attention mechanism for the interaction of existing queries to generate a new query in current frame. Finally, based on the initial target template and learnt autoregressive queries, a spatio-temporal information fusion module (STM) is designed for spatiotemporal formation aggregation to locate a target object. Benefiting from the STM, we can effectively combine the static appearance and instantaneous changes to guide robust tracking. Extensive experiments show that our method significantly improves the tracker's performance on six popular tracking benchmarks: LaSOT, LaSOT ext , TrackingNet, GOT-10k, TNL2K, and UAV123. Code and models will be https://github.com/orgs/GXNU-ZhongLab.</description><subject>Adaptation models</subject><subject>Computational modeling</subject><subject>Computer vision</subject><subject>Target tracking</subject><subject>Transformers</subject><subject>Visualization</subject><issn>2575-7075</issn><isbn>9798350353006</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2024</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotkM1KAzEUhaMgWLRv0MW8wNSb3MlNsizFPyj-Dm5LOnNTo21nSKaKb2-Lrg6c7_AtjhATCVMpwV3N355etDKIUwWqmoK0ik7E2BlnUQNqBKBTMVLa6NKA0edinPMHAKCSkpwdiYfZfugSrxPnHL-4eN5zipyL0KVi1vp-OJZ18s1n3K2L7zi8F6-9H2JX1rztu-Q3R7rLh_2WU74UZ8FvMo__80LUN9f1_K5cPN7ez2eLMpKhkgMat0JsULZkTHAOm8CVqXywEisdKNhVu8LQgm0IySpVoXOkXesbUh4vxORPG5l52ae49elnKYE0mcMdv59PUGc</recordid><startdate>20240616</startdate><enddate>20240616</enddate><creator>Xie, Jinxia</creator><creator>Zhong, Bineng</creator><creator>Mo, Zhiyi</creator><creator>Zhang, Shengping</creator><creator>Shi, Liangtao</creator><creator>Song, Shuxiang</creator><creator>Ji, Rongrong</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>20240616</creationdate><title>Autoregressive Queries for Adaptive Tracking with Spatio-Temporal Transformers</title><author>Xie, Jinxia ; Zhong, Bineng ; Mo, Zhiyi ; Zhang, Shengping ; Shi, Liangtao ; Song, Shuxiang ; Ji, Rongrong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i676-ef379b33c31d677f993cfe474af81345f6f8bdb3fd08c6368224399659dac62a3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Adaptation models</topic><topic>Computational modeling</topic><topic>Computer vision</topic><topic>Target tracking</topic><topic>Transformers</topic><topic>Visualization</topic><toplevel>online_resources</toplevel><creatorcontrib>Xie, Jinxia</creatorcontrib><creatorcontrib>Zhong, Bineng</creatorcontrib><creatorcontrib>Mo, Zhiyi</creatorcontrib><creatorcontrib>Zhang, Shengping</creatorcontrib><creatorcontrib>Shi, Liangtao</creatorcontrib><creatorcontrib>Song, Shuxiang</creatorcontrib><creatorcontrib>Ji, Rongrong</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xie, Jinxia</au><au>Zhong, Bineng</au><au>Mo, Zhiyi</au><au>Zhang, Shengping</au><au>Shi, Liangtao</au><au>Song, Shuxiang</au><au>Ji, Rongrong</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Autoregressive Queries for Adaptive Tracking with Spatio-Temporal Transformers</atitle><btitle>2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</btitle><stitle>CVPR</stitle><date>2024-06-16</date><risdate>2024</risdate><spage>19300</spage><epage>19309</epage><pages>19300-19309</pages><eissn>2575-7075</eissn><eisbn>9798350353006</eisbn><coden>IEEPAD</coden><abstract>The rich spatio-temporal information is crucial to capture the complicated target appearance variations in visual tracking. However, most top-performing tracking algorithms rely on many hand-crafted components for spatio-temporal information aggregation. Consequently, the spatio-temporal information is far away from being fully explored. To alleviate this issue, we propose an adaptive tracker with spatio-temporal transformers (named AQA-Track), which adopts simple autoregressive queries to effectively learn spatio-temporal information without many hand-designed components. Firstly, we introduce a set of learnable and autoregressive queries to capture the instantaneous target appearance changes in a sliding window fashion. Then, we design a novel attention mechanism for the interaction of existing queries to generate a new query in current frame. Finally, based on the initial target template and learnt autoregressive queries, a spatio-temporal information fusion module (STM) is designed for spatiotemporal formation aggregation to locate a target object. Benefiting from the STM, we can effectively combine the static appearance and instantaneous changes to guide robust tracking. Extensive experiments show that our method significantly improves the tracker's performance on six popular tracking benchmarks: LaSOT, LaSOT ext , TrackingNet, GOT-10k, TNL2K, and UAV123. Code and models will be https://github.com/orgs/GXNU-ZhongLab.</abstract><pub>IEEE</pub><doi>10.1109/CVPR52733.2024.01826</doi><tpages>10</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	EISSN: 2575-7075
ispartof	2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, p.19300-19309
issn	2575-7075
language	eng
recordid	cdi_ieee_primary_10656727
source	IEEE Xplore All Conference Series
subjects	Adaptation models Computational modeling Computer vision Target tracking Transformers Visualization
title	Autoregressive Queries for Adaptive Tracking with Spatio-Temporal Transformers
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T01%3A02%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Autoregressive%20Queries%20for%20Adaptive%20Tracking%20with%20Spatio-Temporal%20Transformers&rft.btitle=2024%20IEEE/CVF%20Conference%20on%20Computer%20Vision%20and%20Pattern%20Recognition%20(CVPR)&rft.au=Xie,%20Jinxia&rft.date=2024-06-16&rft.spage=19300&rft.epage=19309&rft.pages=19300-19309&rft.eissn=2575-7075&rft.coden=IEEPAD&rft_id=info:doi/10.1109/CVPR52733.2024.01826&rft.eisbn=9798350353006&rft_dat=%3Cieee_CHZPO%3E10656727%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i676-ef379b33c31d677f993cfe474af81345f6f8bdb3fd08c6368224399659dac62a3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10656727&rfr_iscdi=true