Loading…

Real-time object detection method with single-domain generalization based on YOLOv8

The prevailing models for object detection are often beset by a dearth of generalizability across domains. Specifically, while these models may perform exceptionally well on a given dataset, their efficacy can plummet when confronted with novel domains that lie beyond their training purview. The sin...

Full description

Saved in:
Bibliographic Details
Published in:Journal of real-time image processing 2024-12, Vol.21 (6), p.191, Article 191
Main Authors: Zhou, Yipeng, Qian, Huaming
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The prevailing models for object detection are often beset by a dearth of generalizability across domains. Specifically, while these models may perform exceptionally well on a given dataset, their efficacy can plummet when confronted with novel domains that lie beyond their training purview. The single-domain generalization methods based on Faster R-CNN are constrained by the underlying strategies, which not only exhibit slow speeds and suboptimal accuracy levels but also demonstrate inadequate generalization. This paper proposes a Complementary Pseudo Multi-domain Generation Method based on YOLOv8 (Y-CPMG). The methodology fortifies the generalization prowess by fabricating a spectrum of pseudo domain information within the feature space. To elaborate, we harness the capabilities of pre-trained visual-language model, leveraging textual prompts to extract domain-specific feature enhancements. These enhancements are then amalgamated with the original images to simulate multi-domain scenarios. Building on this foundation, we delve deeper into the nuances of the real world by introducing normalization perturbation (NP) to uncover a variety of latent domain styles. This approach addresses potential limitations in visual-language models when emulating scenes of diverse styles. Empirical evaluations conducted across a spectrum of weather-diverse public datasets have demonstrated that the proposed method achieves a marked enhancement in performance for the task of domain generalization object detection. With an input dimension of 3 × 608 × 1088, the detection speed reaches 38 FPS, which represents a 65.2 % improvement over Faster R-CNN-based methods, fully meeting the requirements for real-time processing.
ISSN:1861-8200
1861-8219
DOI:10.1007/s11554-024-01572-z