Loading…
Real-time object detection method with single-domain generalization based on YOLOv8
The prevailing models for object detection are often beset by a dearth of generalizability across domains. Specifically, while these models may perform exceptionally well on a given dataset, their efficacy can plummet when confronted with novel domains that lie beyond their training purview. The sin...
Saved in:
Published in: | Journal of real-time image processing 2024-12, Vol.21 (6), p.191, Article 191 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The prevailing models for object detection are often beset by a dearth of generalizability across domains. Specifically, while these models may perform exceptionally well on a given dataset, their efficacy can plummet when confronted with novel domains that lie beyond their training purview. The single-domain generalization methods based on Faster R-CNN are constrained by the underlying strategies, which not only exhibit slow speeds and suboptimal accuracy levels but also demonstrate inadequate generalization. This paper proposes a Complementary Pseudo Multi-domain Generation Method based on YOLOv8 (Y-CPMG). The methodology fortifies the generalization prowess by fabricating a spectrum of pseudo domain information within the feature space. To elaborate, we harness the capabilities of pre-trained visual-language model, leveraging textual prompts to extract domain-specific feature enhancements. These enhancements are then amalgamated with the original images to simulate multi-domain scenarios. Building on this foundation, we delve deeper into the nuances of the real world by introducing normalization perturbation (NP) to uncover a variety of latent domain styles. This approach addresses potential limitations in visual-language models when emulating scenes of diverse styles. Empirical evaluations conducted across a spectrum of weather-diverse public datasets have demonstrated that the proposed method achieves a marked enhancement in performance for the task of domain generalization object detection. With an input dimension of 3
×
608
×
1088, the detection speed reaches 38 FPS, which represents a 65.2
%
improvement over Faster R-CNN-based methods, fully meeting the requirements for real-time processing. |
---|---|
ISSN: | 1861-8200 1861-8219 |
DOI: | 10.1007/s11554-024-01572-z |