Loading…

Multi-attention associate prediction network for visual tracking

Classification-regression prediction networks have realized impressive success in several modern deep trackers. However, there is an inherent difference between classification and regression tasks, so they have diverse even opposite demands for feature matching. Existed models always ignore the key...

Full description

Saved in:
Bibliographic Details
Published in:Neurocomputing (Amsterdam) 2025-01, Vol.614, p.128785, Article 128785
Main Authors: Sun, Xinglong, Sun, Haijiang, Jiang, Shan, Wang, Jiacheng, Wei, Xilai, Hu, Zhonghe
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Classification-regression prediction networks have realized impressive success in several modern deep trackers. However, there is an inherent difference between classification and regression tasks, so they have diverse even opposite demands for feature matching. Existed models always ignore the key issue and only employ a unified matching block in two task branches, decaying the decision quality. Besides, these models also struggle with decision misalignment situation. In this paper, we propose a multi-attention associate prediction network (MAPNet) to tackle the above problems. Concretely, two novel matchers, i.e., category-aware matcher and spatial-aware matcher, are first designed for feature comparison by integrating self, cross, channel or spatial attentions organically. They are capable of fully capturing the category-related semantics for classification and the local spatial contexts for regression, respectively. Then, we present a dual alignment module to enhance the correspondences between two branches, which is useful to find the optimal tracking solution. Finally, we describe a Siamese tracker built upon the proposed prediction network, which achieves the leading performance on five tracking benchmarks, consisting of LaSOT, TrackingNet, GOT-10k, TNL2k and UAV123, and surpasses other state-of-the-art approaches. •Two novel feature matchers are proposed to fully capture the category semantic patterns and the spatial detailed cues.•We present an associate prediction network to achieve both robust classification and precise location.•Numerous experimental results demonstrate that the proposed method is superior to existed trackers.
ISSN:0925-2312
DOI:10.1016/j.neucom.2024.128785