Loading…

Convolutional Neural Network for Head Segmentation and Counting in Crowded Retail Environment Using Top-view Depth Images

Since the emergence of big data, the popularity of deep learning models has increased and they are being implemented in a wide range of applications, including people detection and counting in congested environments. Detecting and counting people for human behavior analysis in retail stores is a cha...

Full description

Saved in:

Bibliographic Details
Published in:	Arabian journal for science and engineering (2011) 2024-03, Vol.49 (3), p.3735-3749
Main Authors:	Abed, Almustafa, Akrout, Belhassen, Amous, Ikram
Format:	Article
Language:	English
Subjects:	Artificial neural networks Big Data Coders Datasets Deep learning Engineering Humanities and Social Sciences Illuminance Machine learning multidisciplinary Research Article-Computer Engineering and Computer Science Retail stores Science Semantic segmentation Semantics
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Since the emergence of big data, the popularity of deep learning models has increased and they are being implemented in a wide range of applications, including people detection and counting in congested environments. Detecting and counting people for human behavior analysis in retail stores is a challenging research problem due to the congested and crowded environment. This paper proposes a deep learning approach for detecting and counting people in the presence of occlusions and illuminance variation in a crowded retail environment, utilizing deep CNNs (DCNNs) for semantic segmentation of top-view depth visual data. Semantic segmentation has been implemented using (DCNNs) in recent years since it is a powerful approach. The objective of this paper is to design a novel architecture that consists of an encoder–decoder architecture. We were motivated to use transfer learning to solve the problem of insufficient training data. We used ResNet50 for the encoder, and we built the decoder part as a novel contribution. Our model was trained and evaluated on the TVHeads dataset and the people counting dataset (PCDS) that are available for research purposes. It consists of depth data of people captured from a top-view RGB-D sensor. The segmentation results indicate high accuracy and demonstrate that the proposed model is robust and accurate.
ISSN:	2193-567X 1319-8025 2191-4281
DOI:	10.1007/s13369-023-08159-z