Loading…

Deep learning-based few-shot person re-identification from top-view RGB and depth images

Person re-identification (re-id) attempts to match a person from the images of different time steps. Existing deep learning approaches either use appearance or geometry features for re-id which does not provide the required robustness because of higher intra-class similarity. Existing supervised re-...

Full description

Saved in:
Bibliographic Details
Published in:Neural computing & applications 2024-11, Vol.36 (31), p.19365-19382
Main Authors: Abed, Almustafa, Akrout, Belhassen, Amous, Ikram
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Person re-identification (re-id) attempts to match a person from the images of different time steps. Existing deep learning approaches either use appearance or geometry features for re-id which does not provide the required robustness because of higher intra-class similarity. Existing supervised re-id approaches utilize Convolutional Neural Networks (CNNs) and identity-labeled images to train, where the person images are taken by the sensors from a horizontal view. The horizontal view exposes the privacy of the people because of their facial appearance in the image. Moreover, person re-id includes new unseen people; however, CNN does not have the ability to identify the new unseen people because of a lack of continual learning. Privacy-preserved computer vision-assisted person re-id systems can benefit from visual appearance and geometry features extracted from top-view RGB and depth input. This paper presents the privacy-preserved person top-view re-id few-shot network which uses the appearance and geometry features. The EfficientNet is used for appearance-based features from RGB input, while PointNet is used to extract the geometry features from the point cloud which is made from the RGB-D image registration. Concatenated features from EfficientNet and PointNet are fed to the two-layer Bi-LSTM network for person identification. Finally, the whole network is converted into a few-shot network to achieve continual learning by removing the output layer and joining the similarity measurement unit. This approach is based on CNN and fine-tunes a TVPR/2 dataset acquired by using a top-view arrangement that is publicly available. The experimental results on TVPR/2 and GODPR datasets show that the proposed re-id network outperforms other state-of-the-art networks.
ISSN:0941-0643
1433-3058
DOI:10.1007/s00521-024-10239-6