Loading…
Background and Visual Feature-Aware Data Augmentation for FGIR via Image Generation
Fine-Grained Image Recognition (FGIR) involves distinguishing subtle differences within the same category, a challenging task due to high inter-class similarity and intra-class variability. Enhancing accuracy typically requires large, well-labeled datasets, which are difficult to obtain for FGIR. We...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Fine-Grained Image Recognition (FGIR) involves distinguishing subtle differences within the same category, a challenging task due to high inter-class similarity and intra-class variability. Enhancing accuracy typically requires large, well-labeled datasets, which are difficult to obtain for FGIR. We propose a method to augment datasets using an image generative AI model. We investigated input text prompts indicating target class names with diverse backgrounds and used a multimodal model to incorporate the target class's visual features. Our method also employed an image processing pipeline for background replacement. Our experiments show that while Text-to-Image generation struggles with detailed feature representation, it improves accuracy in one-shot learning scenarios. Additionally, using image generative AI models for background replacement can outperform baseline methods under certain conditions, highlighting the effectiveness of our method. |
---|---|
ISSN: | 2693-0854 |
DOI: | 10.1109/GCCE62371.2024.10760857 |