Loading…
Image Caption Generation for Dai Ethnic Clothing Based on ViT-B and BertLMHeadModel
Dai ethnic clothing is unique and distinct subset of minority apparel within China, impressing with their rich patterns and unique styles. Yet, due to the complexity and diversity of Dai clothing, providing accurate captions of these outfits is a formidable challenge. In an effort to address this is...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Dai ethnic clothing is unique and distinct subset of minority apparel within China, impressing with their rich patterns and unique styles. Yet, due to the complexity and diversity of Dai clothing, providing accurate captions of these outfits is a formidable challenge. In an effort to address this issue, this study utilizes ViT-B from the pre-trained model BLIP to perform feature extraction on images of Dai clothing, thereby converting them into a sequence of feature vectors. Subsequently, these vectors are decoded using BertLMHeadModel to generate corresponding text captions, effectively producing captions for the Dai clothing. For the training of our model, we constructed Dai clothing dataset consisting of Dai clothing image dataset and image text annotation dataset. With images and textual descriptions serving as inputs, model parameters are optimized using cross-entropy loss and AdamW. In employing this strategy, our model is capable of extracting key features from images and generating accurate captions, providing detailed captions of the characteristics of Dai clothing with a high degree of semantic accuracy. |
---|---|
ISSN: | 2327-0594 |
DOI: | 10.1109/ICSESS58500.2023.10293038 |