Loading…

Multitask Multigranularity Aggregation With Global-Guided Attention for Video Person Re-Identification

The goal of video-based person re-identification (Re-ID) is to identify the same person across multiple non-overlapping cameras. The key to accomplishing this challenging task is to sufficiently exploit both spatial and temporal cues in video sequences. However, most current methods are incapable of...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on circuits and systems for video technology 2022-11, Vol.32 (11), p.7758-7771
Main Authors: Sun, Dengdi, Huang, Jiale, Hu, Lei, Tang, Jin, Ding, Zhuanlian
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The goal of video-based person re-identification (Re-ID) is to identify the same person across multiple non-overlapping cameras. The key to accomplishing this challenging task is to sufficiently exploit both spatial and temporal cues in video sequences. However, most current methods are incapable of accurately locating semantic regions or efficiently filtering discriminative spatio-temporal features; so it is difficult to handle issues such as spatial misalignment and occlusion. Thus, we propose a novel feature aggregation framework, multi-task and multi-granularity aggregation with global-guided attention (MMA-GGA), which aims to adaptively generate more representative spatio-temporal aggregation features. Specifically, we develop a multi-task multi-granularity aggregation (MMA) module to extract features at different locations and scales to identify key semantic-aware regions that are robust to spatial misalignment. Then, to determine the importance of the multi-granular semantic information, we propose a global-guided attention (GGA) mechanism to learn weights based on the global features of the video sequence, allowing our framework to identify stable local features while ignoring occlusions. Therefore, the MMA-GGA framework can efficiently and effectively capture more robust and representative features. Extensive experiments on four benchmark datasets demonstrate that our MMA-GGA framework outperforms current state-of-the-art methods. In particular, our method achieves a rank-1 accuracy of 91.0% on the MARS dataset, the most widely used database, significantly outperforming existing methods.
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2022.3183011