Loading…
Inter-Domain Invariant Cross-Domain Object Detection Using Style and Content Disentanglement for In-Vehicle Images
The accurate detection of relevant vehicles, pedestrians, and other targets on the road plays a crucial role in ensuring the safety of autonomous driving. In recent years, object detectors based on Transformers or CNNs have achieved excellent performance in the fully supervised paradigm. However, wh...
Saved in:
Published in: | Remote sensing (Basel, Switzerland) Switzerland), 2024-01, Vol.16 (2), p.304 |
---|---|
Main Authors: | , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The accurate detection of relevant vehicles, pedestrians, and other targets on the road plays a crucial role in ensuring the safety of autonomous driving. In recent years, object detectors based on Transformers or CNNs have achieved excellent performance in the fully supervised paradigm. However, when the trained model is directly applied to unfamiliar scenes where the training data and testing data have different distributions statistically, the model’s performance may decrease dramatically. To address this issue, unsupervised domain adaptive object detection methods have been proposed. However, these methods often exhibit decreasing performance when the gap between the source and target domains increases. Previous works mainly focused on utilizing the style gap to reduce the domain gap while ignoring the content gap. To tackle this challenge, we introduce a novel method called IDI-SCD that effectively addresses both the style and content gaps simultaneously. Firstly, the domain gap is reduced by disentangling it into the style gap and content gap, generating corresponding intermediate domains in the meanwhile. Secondly, during training, we focus on one single domain gap at a time to achieve inter-domain invariance. That is, the content gap is tackled while maintaining the style gap, and vice versa. In addition, the style-invariant loss is used to narrow down the style gap, and the mean teacher self-training framework is used to narrow down the content gap. Finally, we introduce a multiscale fusion strategy to enhance the quality of pseudo-labels, which mainly focus on enhancing the detection performance for extreme-scale objects (very large or very small objects). We conduct extensive experiments on four mainstream datasets of in-vehicle images. The experimental results demonstrate the effectiveness of our method and its superiority over most of the existing methods. |
---|---|
ISSN: | 2072-4292 2072-4292 |
DOI: | 10.3390/rs16020304 |