Loading…

Real-time object detection method with single-domain generalization based on YOLOv8

The prevailing models for object detection are often beset by a dearth of generalizability across domains. Specifically, while these models may perform exceptionally well on a given dataset, their efficacy can plummet when confronted with novel domains that lie beyond their training purview. The sin...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of real-time image processing 2024-12, Vol.21 (6), p.191, Article 191
Main Authors:	Zhou, Yipeng, Qian, Huaming
Format:	Article
Language:	English
Subjects:	Adaptability Adaptation Artificial neural networks Computer Graphics Computer Science Datasets Image enhancement Image Processing and Computer Vision Language Multimedia Information Systems Object recognition Pattern Recognition Real time Semantics Signal,Image and Speech Processing Visual tasks
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The prevailing models for object detection are often beset by a dearth of generalizability across domains. Specifically, while these models may perform exceptionally well on a given dataset, their efficacy can plummet when confronted with novel domains that lie beyond their training purview. The single-domain generalization methods based on Faster R-CNN are constrained by the underlying strategies, which not only exhibit slow speeds and suboptimal accuracy levels but also demonstrate inadequate generalization. This paper proposes a Complementary Pseudo Multi-domain Generation Method based on YOLOv8 (Y-CPMG). The methodology fortifies the generalization prowess by fabricating a spectrum of pseudo domain information within the feature space. To elaborate, we harness the capabilities of pre-trained visual-language model, leveraging textual prompts to extract domain-specific feature enhancements. These enhancements are then amalgamated with the original images to simulate multi-domain scenarios. Building on this foundation, we delve deeper into the nuances of the real world by introducing normalization perturbation (NP) to uncover a variety of latent domain styles. This approach addresses potential limitations in visual-language models when emulating scenes of diverse styles. Empirical evaluations conducted across a spectrum of weather-diverse public datasets have demonstrated that the proposed method achieves a marked enhancement in performance for the task of domain generalization object detection. With an input dimension of 3 × 608 × 1088, the detection speed reaches 38 FPS, which represents a 65.2 % improvement over Faster R-CNN-based methods, fully meeting the requirements for real-time processing.
ISSN:	1861-8200 1861-8219
DOI:	10.1007/s11554-024-01572-z