Loading…

Learning Visual Conditioning Tokens to Correct Domain Shift for Fully Test-time Adaptation

Fully test-time adaptation aims to adapt the network model based on sequential analysis of input samples during the inference stage to address the cross-domain performance degradation problem of deep neural networks. This work is based on the following interesting finding: in transformer-based image...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on multimedia 2024-08, p.1-12
Main Authors:	Tang, Yushun, Chen, Shuoshuo, Kan, Zhehan, Zhang, Yi, Guo, Qinghai, He, Zhihai
Format:	Article
Language:	English
Subjects:	Adaptation models Data models Domain Shift Learning systems Task analysis Test-time Adaptation Training Transformers Visual Conditioning Token Visualization
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Fully test-time adaptation aims to adapt the network model based on sequential analysis of input samples during the inference stage to address the cross-domain performance degradation problem of deep neural networks. This work is based on the following interesting finding: in transformer-based image classification, the class token at the first transformer encoder layer can be learned to capture the domain-specific characteristics of target samples during test-time adaptation. This learned token, when combined with input image patch embeddings, is able to gradually remove the domain-specific information from the feature representations of input samples during the transformer encoding process, thereby significantly improving the test-time adaptation performance of the source model across different domains. We refer to this class token as visual conditioning token (VCT). To successfully learn the VCT, we propose a bi-level learning approach to capture the longterm variations of domain-specific characteristics while accommodating local variations of instance-specific characteristics. Experimental results on the benchmark datasets demonstrate that our proposed bi-level visual conditioning token learning method is able to achieve significantly improved test-time adaptation performance by up to 1.9%
ISSN:	1520-9210
DOI:	10.1109/TMM.2024.3443633