Loading…
Dual Feature Fusion Tracking with Combined Cross-correlation and Transformer
Siamese networks have found applications in various fields, notably object tracking, due to their remarkable speed and accuracy. Siamese tracking networks rely on cross-correlation to obtain the similarity score between the target template and the search region. However, since cross-correlation is a...
Saved in:
Published in: | IEEE access 2023-01, Vol.11, p.1-1 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Siamese networks have found applications in various fields, notably object tracking, due to their remarkable speed and accuracy. Siamese tracking networks rely on cross-correlation to obtain the similarity score between the target template and the search region. However, since cross-correlation is a local matching operation, it cannot effectively capture the global context information. While the Transformer for feature fusion can better capture long-range dependencies and obtain more semantic information, more localized edge information is needed to distinguish the target from the background. Cross-correlation fusion and Transformer fusion have their advantages. They can complement each other, so we combine them and propose a dual feature fusion tracker (SiamCT) to obtain the local correlations and global dependencies between the target and the search region. Specifically, we construct two parallel feature fusion paths based on cross-correlation and Transformer. Among them, for cross-correlation fusion, we adopt the more efficient two-dimension pixel-wise cross-correlation (TDPC), which performs correlation operations from both spatial and channel dimensions, and the interaction of multidimensional information helps to realize more accurate feature fusion. Subsequently, the fused features are augmented by coordinate attention (CA) for orientation-dependent positional information. For Transformer fusion, we introduce cos-based linear attention(ClA) to improve Transformer's ability to acquire global context information. Our SiamCT outperforms existing leading methods in GOT-10k, LaSOT, TrackingNet, and OTB100 benchmarks based on extensive experiments. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2023.3346044 |