Loading…

Visible-Infrared Person Re-Identification via Cross-Modality Interaction Transformer

Visible-infrared person re-identification (VI Re-ID) is designed to match person images of the same identity from visible and infrared cameras. Transformer structures have been successfully applied in the field of VI Re-ID. However, previous Transformer-based methods were mainly designed to capture...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on multimedia 2023-01, Vol.25, p.1-13
Main Authors: Feng, Yujian, Yu, Jian, Chen, Feng, Ji, Yimu, Wu, Fei, Liu, Shangdon, Jing, Xiao-Yuan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Visible-infrared person re-identification (VI Re-ID) is designed to match person images of the same identity from visible and infrared cameras. Transformer structures have been successfully applied in the field of VI Re-ID. However, previous Transformer-based methods were mainly designed to capture global content information in a single modality, and could not simultaneously perceive semantic information between two modalities from a global perspective. To solve this problem, we propose a novel framework named the cross-modality interaction Transformer (CMIT). It has strong abilities in modeling spatial and sequential features that can capture dependencies between long-range features, and explicitly improves the discriminativeness of features by exchanging information across modalities, thus contributing to obtaining modality-invariant representations. Specifically, CMIT utilizes a cross-modality attention mechanism to enrich the feature representations of each patch token by interacting with the patch tokens of the other modality, and aggregates local features of the CNN structure and global information of the Transformer structure to mine feature saliency representation. Furthermore, the modality-discriminative (MD) loss function is proposed to learn potential consistency between modalities to encourage intra-modality compactness within class and inter-modality separation between classes. Extensive experiments on two benchmarks demonstrate that our approach outperforms state-of-the-art methods.
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2022.3224663