Loading…

DeepList: Learning Deep Features With Adaptive Listwise Constraint for Person Reidentification

Person reidentification (re-id) aims to match a specific person across nonoverlapping cameras, which is an important but challenging task in video surveillance. Conventional methods mainly focus either on feature constructing or metric learning. Recently, some deep learning-based methods have been p...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on circuits and systems for video technology 2017-03, Vol.27 (3), p.513-524
Main Authors: Wang, Jin, Wang, Zheng, Gao, Changxin, Sang, Nong, Huang, Rui
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Person reidentification (re-id) aims to match a specific person across nonoverlapping cameras, which is an important but challenging task in video surveillance. Conventional methods mainly focus either on feature constructing or metric learning. Recently, some deep learning-based methods have been proposed to learn image features and similarity measures jointly. However, current deep models for person re-id are usually trained with either pairwise loss, where the number of negative pairs greatly outnumbering that of positive pairs may lead the training model to be biased toward negative pairs or constant margin hinge loss, without considering the fact that hard negative samples should be paid more attention in the training stage. In this paper, we propose to learn deep representations with an adaptive margin listwise loss. First, ranking lists instead of image pairs are used as training samples, in this way, the problem of data imbalance is relaxed. Second, by introducing an adaptive margin parameter in the listwise loss function, it can assign larger margins to harder negative samples, which can be interpreted as an implementation of the automatic hard negative mining strategy. To gain robustness against changes in poses and part occlusions, our architecture combines four convolutional neural networks, each of which embeds images from different scales or different body parts. The final combined model performs much better than each single model. The experimental results show that our approach achieves very promising results on the challenging CUHK03, CUHK01, and VIPeR data sets.
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2016.2586851