Loading…

Robust Tracking via Combing Top-Down and Bottom-Up Attention

Transformer attention plays an important role in current top-performing trackers. However, it is bottom-up, driven by stimulus and lacks intrinsic prior guidance. This bottom-up attention mechanism leads to an emphasis on all objects in the input images, rather than the task related objects. As a re...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on circuits and systems for video technology 2024-10, Vol.34 (10), p.9774-9785
Main Authors:	Li, Ning, Zhong, Bineng, Zheng, Yaozong, Liang, Qihua, Mo, Zhiyi, Song, Shuxiang
Format:	Article
Language:	English
Subjects:	Bayes methods bottom-up Feature extraction Generators Object tracking Target tracking Task analysis top-down Transformers vision transformer Visualization
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Transformer attention plays an important role in current top-performing trackers. However, it is bottom-up, driven by stimulus and lacks intrinsic prior guidance. This bottom-up attention mechanism leads to an emphasis on all objects in the input images, rather than the task related objects. As a result, the performance of the bottom-up attention based trackers is deteriorated in complicated scenes. To address this issue, we propose a robust tracker that combines bottom-up attention with top-down attention to comply with the existing ViT framework, named TBTrack. TBTrack can not only utilize the existing bottom-up attention mechanisms to model the long-range relationship of input tokens, but also utilize a newly added top-down attention mechanism to pay more attention to task related object and further eliminate interference from similar objects and backgrounds. Specifically, we firstly design a top-down prior generation module using an adaptive learning parameter combined with the template inputs to obtain top-down task guided signals. Then, we inject the prior signals into a bottom-up attention module to obtain a top-down and bottom-up attention combination block (TB-Block). Finally, we stack these TB-Blocks to construct our tracker (TBTrack) with top-down prior guidance capability, which focuses more on the task related object. Through extensive experiments, our TBTrack achieves impressive performance on multiple tracking benchmarks, including GOT-10k, LaSOT, LaSOT _{ext} , TNL2K, TrackingNet, UAV123 and so on. The code and trained models will be publicly available.
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2024.3402436