Loading…
Self-supervised multi-scale semantic consistency regularization for unsupervised image-to-image translation
Unsupervised image-to-image translation aims to learn a domain mapping function that preserves the semantics of an input image while adapting its style to target domains without paired data. However, if there is a large semantic mismatch between the source and target domains, current methods often s...
Saved in:
Published in: | Computer vision and image understanding 2024-04, Vol.241, p.103950, Article 103950 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Unsupervised image-to-image translation aims to learn a domain mapping function that preserves the semantics of an input image while adapting its style to target domains without paired data. However, if there is a large semantic mismatch between the source and target domains, current methods often suffer from semantics distortion. Based on dense self-supervised representation learning, a novel Multi-Scale Semantic Consistency Regularization (MSSCR) is presented to alleviate the semantic distortion and enable the generator to produce images with realistic local semantics and consistent structures. Both local and global multi-scale representations are learned by the MSSCR during training the different layers of a discriminator. Concretely, MSSCR operates by sliding a fixed-size window over the overlapping region between a pair of views cropped from a single real image, aligning these areas with their corresponding multi-scale representation regions extracted from the discriminator, and then maximizing the similarity of representations between positive pairs. Qualitative and quantitative experiments demonstrate the superiority of MSSCR on image-to-image translation and image generation tasks.
•A novel Multi-Scale Semantic Consistency Regularization based on dense self-supervised learning is proposed.•Our method helps the generator produce images with realistic local details and global semantic consistency.•Our method achieves excellent performance both qualitatively and quantitatively. |
---|---|
ISSN: | 1077-3142 1090-235X |
DOI: | 10.1016/j.cviu.2024.103950 |