Loading…

Compare and Focus: Multi-Scale View Aggregation for Crowd Counting

Recently, some state-of-the-art (SOTA) methods have designed dedicated context extractors to capture the global information that serves as a key clue for describing crowd density. A promising alternative is the transformer-based model which inherently captures long-range context dependencies. Recent...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on intelligent transportation systems 2024-10, Vol.25 (10), p.13231-13239
Main Authors:	Jiang, Shengqin, Cai, Jialu, Zhang, Haokui, Liu, Yu, Liu, Qingshan
Format:	Article
Language:	English
Subjects:	Context awareness Convolutional neural networks Crowd counting Crowdsourcing Data mining Feature extraction Location awareness multi-scale views ROI extraction Semantics transformer Transformers
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Recently, some state-of-the-art (SOTA) methods have designed dedicated context extractors to capture the global information that serves as a key clue for describing crowd density. A promising alternative is the transformer-based model which inherently captures long-range context dependencies. Recent related studies have made impressive progress, yet the following issues remain: (1) The size of the heads in the image is large near and small far away. The existing models fail to cope well with these variations. (2) There is an imbalance in the distribution of samples across different densities in the dataset, which leads to poor network performance on density distributions with a small number of samples. To address these issues, we propose to aggregate multi-scale views through Compare and Focus strategies. In terms of the first strategy, we mine differential hints from multi-scale view features to capture heads of varying sizes. This can effectively reduce the influence of redundant information while perceiving the subtleties of various view inputs, making it simpler to establish discriminative representations. As for the second strategy, we introduce a new activation function to formulate the Region of Interest (ROI) extraction module that enables the network to focus on relevant regions effectively. It can alleviate the extreme distribution imbalance of samples with different densities. Finally, several experiments show that our method achieves SOTA performance on four challenging datasets.
ISSN:	1524-9050 1558-0016
DOI:	10.1109/TITS.2024.3432789