Loading…
DeepList: Learning Deep Features With Adaptive Listwise Constraint for Person Reidentification
Person reidentification (re-id) aims to match a specific person across nonoverlapping cameras, which is an important but challenging task in video surveillance. Conventional methods mainly focus either on feature constructing or metric learning. Recently, some deep learning-based methods have been p...
Saved in:
Published in: | IEEE transactions on circuits and systems for video technology 2017-03, Vol.27 (3), p.513-524 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Person reidentification (re-id) aims to match a specific person across nonoverlapping cameras, which is an important but challenging task in video surveillance. Conventional methods mainly focus either on feature constructing or metric learning. Recently, some deep learning-based methods have been proposed to learn image features and similarity measures jointly. However, current deep models for person re-id are usually trained with either pairwise loss, where the number of negative pairs greatly outnumbering that of positive pairs may lead the training model to be biased toward negative pairs or constant margin hinge loss, without considering the fact that hard negative samples should be paid more attention in the training stage. In this paper, we propose to learn deep representations with an adaptive margin listwise loss. First, ranking lists instead of image pairs are used as training samples, in this way, the problem of data imbalance is relaxed. Second, by introducing an adaptive margin parameter in the listwise loss function, it can assign larger margins to harder negative samples, which can be interpreted as an implementation of the automatic hard negative mining strategy. To gain robustness against changes in poses and part occlusions, our architecture combines four convolutional neural networks, each of which embeds images from different scales or different body parts. The final combined model performs much better than each single model. The experimental results show that our approach achieves very promising results on the challenging CUHK03, CUHK01, and VIPeR data sets. |
---|---|
ISSN: | 1051-8215 1558-2205 |
DOI: | 10.1109/TCSVT.2016.2586851 |