Loading…

A Transformer-based network with adaptive spatial prior for visual tracking

Single object tracking (SOT) in complex scenes presents significant challenges in computer vision. In recent years, transformer has shown its demonstrated efficacy in visual object tracking tasks, due to its capacity to capture the long-range dependencies between image pixels. However, two limitatio...

Full description

Saved in:
Bibliographic Details
Published in:Neurocomputing (Amsterdam) 2025-01, Vol.614, p.128821, Article 128821
Main Authors: Cheng, Feng, Peng, Gaoliang, Li, Junbao, Zhao, Benqi, Pan, Jeng-Shyang, Li, Hang
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Single object tracking (SOT) in complex scenes presents significant challenges in computer vision. In recent years, transformer has shown its demonstrated efficacy in visual object tracking tasks, due to its capacity to capture the long-range dependencies between image pixels. However, two limitations hinder the performance improvement of transformer-based trackers. Firstly, transformer splits and partitions the image into a sequence of patches, which disrupts the internal structural information of the object. Secondly, transformer-based trackers encode the target template and search region together, potentially leading to confusion between the target and background during feature interaction. To address the above issues, we propose a fully transformer-based tracking framework via learning structural prior information, called SPformer. In other words, a self-attention spatial-prior generative network is established for simulating the spatial associations between features. Moreover, the cross-attention structural prior extractors based on Gaussian and arbitrary distributions are developed to seek the semantic interaction features between the object template and the search region, effectively mitigating feature confusion. Extensive experiments on eight prevailing benchmarks demonstrate that SPformer outperforms existing state-of-art (SOAT) trackers. We further analyze the effectiveness of the two proposed prior modules and validate their application in target tracking models.
ISSN:0925-2312
DOI:10.1016/j.neucom.2024.128821