Loading…
Text to image synthesis with multi-granularity feature aware enhancement Generative Adversarial Networks
Synthesizing complex images from text presents challenging. Compared to autoregressive and diffusion model-based methods, Generative Adversarial Network-based methods have significant advantages in terms of computational cost and generation efficiency yet remain two limitations: first, these methods...
Saved in:
Published in: | Computer vision and image understanding 2024-08, Vol.245, p.104042, Article 104042 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Synthesizing complex images from text presents challenging. Compared to autoregressive and diffusion model-based methods, Generative Adversarial Network-based methods have significant advantages in terms of computational cost and generation efficiency yet remain two limitations: first, these methods often refine all features output from the previous stage indiscriminately, without considering these features are initialized gradually during the generation process; second, the sparse semantic constraints provided by the text description are typically ineffective for refining fine-grained features. These issues complicate the balance between generation quality, computational cost and inference speed. To address these issues, we propose a Multi-granularity Feature Aware Enhancement GAN (MFAE-GAN), which allows the refinement process to match the order of different granularity features being initialized. Specifically, MFAE-GAN (1) samples category-related coarse-grained features and instance-level detail-related fine-grained features at different generation stages based on different attention mechanisms in Coarse-grained Feature Enhancement (CFE) and Fine-grained Feature Enhancement (FFE) to guide the generation process spatially, (2) provides denser semantic constraints than textual semantic information through Multi-granularity Features Adaptive Batch Normalization (MFA-BN) in the process of refining fine-grained features, and (3) adopts a Global Semantics Preservation (GSP) to avoid the loss of global semantics when sampling features continuously. Extensive experimental results demonstrate that our MFAE-GAN is competitive in terms of both image generation quality and efficiency.
[Display omitted]
•We propose MFAE-GAN which is competitive in terms of both quality and efficiency performance.•CFE and FFE can match the order of different granularity features being initialized.•CFE and FFE guide generation spatially through mapping the multi-granularity sketches.•MFA-BN in FFE provides denser semantic constraints than textual semantic information.•GSP can supplement the global semantics and preserve semantic integrity. |
---|---|
ISSN: | 1077-3142 1090-235X |
DOI: | 10.1016/j.cviu.2024.104042 |