Loading…

A ConvNext-Based and Feature Enhancement Anchor-Free Siamese Network for Visual Tracking

Existing anchor-based Siamese trackers rely on the anchor’s design to predict the scale and aspect ratio of the target. However, these methods introduce many hyperparameters, leading to computational redundancy. In this paper, to achieve outstanding network efficiency, we propose a ConvNext-based an...

Full description

Saved in:

Bibliographic Details
Published in:	Electronics (Basel) 2022-08, Vol.11 (15), p.2381
Main Authors:	Xu, Qiguo, Deng, Honggui, Zhang, Zeyu, Liu, Yang, Ruan, Xusheng, Liu, Gang
Format:	Article
Language:	English
Subjects:	Ablation Accuracy Algorithms Artificial neural networks Aspect ratio Classification Computer networks Design Euclidean geometry Machine vision Methods Modules Optical tracking Redundancy Target recognition Tracking networks Visual fields
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Existing anchor-based Siamese trackers rely on the anchor’s design to predict the scale and aspect ratio of the target. However, these methods introduce many hyperparameters, leading to computational redundancy. In this paper, to achieve outstanding network efficiency, we propose a ConvNext-based anchor-free Siamese tracking network (CAFSN), which employs an anchor-free design to increase network flexibility and versatility. In CAFSN, to obtain an appropriate backbone network, the state-of-the-art ConvNext network is applied to the visual tracking field for the first time by improving the network stride and receptive field. Moreover, A central confidence branch based on Euclidean distance is offered to suppress low-quality prediction frames in the classification prediction network of CAFSN for robust visual tracking. In particular, we discuss that the Siamese network cannot establish a complete identification model for the tracking target and similar objects, which negatively impacts network performance. We build a Fusion network consisting of crop and 3Dmaxpooling to better distinguish the targets and similar objects’ abilities. This module uses 3DMaxpooling to select the highest activation value to improve the difference between it and other similar objects. Crop unifies the dimensions of different features and reduces the amount of computation. Ablation experiments demonstrate that this module increased success rates by 1.7% and precision by 0.5%. We evaluate CAFSN on challenging benchmarks such as OTB100, UAV123, and GOT-10K, validating advanced performance in noise immunity and similar target identification with 58.44 FPS in real time.
ISSN:	2079-9292 2079-9292
DOI:	10.3390/electronics11152381