Loading…

Background and Visual Feature-Aware Data Augmentation for FGIR via Image Generation

Fine-Grained Image Recognition (FGIR) involves distinguishing subtle differences within the same category, a challenging task due to high inter-class similarity and intra-class variability. Enhancing accuracy typically requires large, well-labeled datasets, which are difficult to obtain for FGIR. We...

Full description

Saved in:

Bibliographic Details
Main Authors:	Kato, Takuya, Serizawa, Shion, Okayama, Mitsuki, Nakano, Yuta, Hasegawa, Tatsuhito
Format:	Conference Proceeding
Language:	English
Subjects:	Accuracy data augmentation Data models FGIR Fine-grained image recognition Generative AI image generative AI models Image synthesis large language models One shot learning Pipelines Text to image Training data Visualization
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Fine-Grained Image Recognition (FGIR) involves distinguishing subtle differences within the same category, a challenging task due to high inter-class similarity and intra-class variability. Enhancing accuracy typically requires large, well-labeled datasets, which are difficult to obtain for FGIR. We propose a method to augment datasets using an image generative AI model. We investigated input text prompts indicating target class names with diverse backgrounds and used a multimodal model to incorporate the target class's visual features. Our method also employed an image processing pipeline for background replacement. Our experiments show that while Text-to-Image generation struggles with detailed feature representation, it improves accuracy in one-shot learning scenarios. Additionally, using image generative AI models for background replacement can outperform baseline methods under certain conditions, highlighting the effectiveness of our method.
ISSN:	2693-0854
DOI:	10.1109/GCCE62371.2024.10760857