Loading…

Background and Visual Feature-Aware Data Augmentation for FGIR via Image Generation

Fine-Grained Image Recognition (FGIR) involves distinguishing subtle differences within the same category, a challenging task due to high inter-class similarity and intra-class variability. Enhancing accuracy typically requires large, well-labeled datasets, which are difficult to obtain for FGIR. We...

Full description

Saved in:
Bibliographic Details
Main Authors: Kato, Takuya, Serizawa, Shion, Okayama, Mitsuki, Nakano, Yuta, Hasegawa, Tatsuhito
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Fine-Grained Image Recognition (FGIR) involves distinguishing subtle differences within the same category, a challenging task due to high inter-class similarity and intra-class variability. Enhancing accuracy typically requires large, well-labeled datasets, which are difficult to obtain for FGIR. We propose a method to augment datasets using an image generative AI model. We investigated input text prompts indicating target class names with diverse backgrounds and used a multimodal model to incorporate the target class's visual features. Our method also employed an image processing pipeline for background replacement. Our experiments show that while Text-to-Image generation struggles with detailed feature representation, it improves accuracy in one-shot learning scenarios. Additionally, using image generative AI models for background replacement can outperform baseline methods under certain conditions, highlighting the effectiveness of our method.
ISSN:2693-0854
DOI:10.1109/GCCE62371.2024.10760857