Loading…

Image Caption Generation for Dai Ethnic Clothing Based on ViT-B and BertLMHeadModel

Dai ethnic clothing is unique and distinct subset of minority apparel within China, impressing with their rich patterns and unique styles. Yet, due to the complexity and diversity of Dai clothing, providing accurate captions of these outfits is a formidable challenge. In an effort to address this is...

Full description

Saved in:
Bibliographic Details
Main Authors: Feng, Zuwei, Wen, Bin, Deng, Hongfei
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 216
container_issue
container_start_page 212
container_title
container_volume
creator Feng, Zuwei
Wen, Bin
Deng, Hongfei
description Dai ethnic clothing is unique and distinct subset of minority apparel within China, impressing with their rich patterns and unique styles. Yet, due to the complexity and diversity of Dai clothing, providing accurate captions of these outfits is a formidable challenge. In an effort to address this issue, this study utilizes ViT-B from the pre-trained model BLIP to perform feature extraction on images of Dai clothing, thereby converting them into a sequence of feature vectors. Subsequently, these vectors are decoded using BertLMHeadModel to generate corresponding text captions, effectively producing captions for the Dai clothing. For the training of our model, we constructed Dai clothing dataset consisting of Dai clothing image dataset and image text annotation dataset. With images and textual descriptions serving as inputs, model parameters are optimized using cross-entropy loss and AdamW. In employing this strategy, our model is capable of extracting key features from images and generating accurate captions, providing detailed captions of the characteristics of Dai clothing with a high degree of semantic accuracy.
doi_str_mv 10.1109/ICSESS58500.2023.10293038
format conference_proceeding
fullrecord <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10293038</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10293038</ieee_id><sourcerecordid>10293038</sourcerecordid><originalsourceid>FETCH-LOGICAL-i119t-f068b398836790bab671f607e2585a02c02413797703eb4c35ac905cd4b902cd3</originalsourceid><addsrcrecordid>eNpVkM1OwzAQhA0Iiar0DTiYB0hZe-PYe6ShP5FacUjhWjnJpjVqkyrJhben4ufAaUb6RiPNCPGoYKoU0FOW5vM8N84ATDVonCrQhIDuSkzIkkMDiIk2-lqMNGobgaH45h9L6E5M-v4DAFA5AoUjkWcnv2eZ-vMQ2kYuueHOf9u67eSLD3I-HJpQyvTYDofQ7OXM91zJS-A9bKOZ9E0lZ9wN682KfbVpKz7ei9vaH3ue_OpYvC3m23QVrV-XWfq8joJSNEQ1JK5Acg4TS1D4IrGqTsCyvqz0oEvQsUJL1gJyEZdofElgyiou6EIrHIuHn97AzLtzF06--9z9_YJfbuFS7g</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Image Caption Generation for Dai Ethnic Clothing Based on ViT-B and BertLMHeadModel</title><source>IEEE Xplore All Conference Series</source><creator>Feng, Zuwei ; Wen, Bin ; Deng, Hongfei</creator><creatorcontrib>Feng, Zuwei ; Wen, Bin ; Deng, Hongfei</creatorcontrib><description>Dai ethnic clothing is unique and distinct subset of minority apparel within China, impressing with their rich patterns and unique styles. Yet, due to the complexity and diversity of Dai clothing, providing accurate captions of these outfits is a formidable challenge. In an effort to address this issue, this study utilizes ViT-B from the pre-trained model BLIP to perform feature extraction on images of Dai clothing, thereby converting them into a sequence of feature vectors. Subsequently, these vectors are decoded using BertLMHeadModel to generate corresponding text captions, effectively producing captions for the Dai clothing. For the training of our model, we constructed Dai clothing dataset consisting of Dai clothing image dataset and image text annotation dataset. With images and textual descriptions serving as inputs, model parameters are optimized using cross-entropy loss and AdamW. In employing this strategy, our model is capable of extracting key features from images and generating accurate captions, providing detailed captions of the characteristics of Dai clothing with a high degree of semantic accuracy.</description><identifier>ISBN: 9798350336269</identifier><identifier>EISSN: 2327-0594</identifier><identifier>EISBN: 9798350336252</identifier><identifier>EISBN: 9798350336276</identifier><identifier>DOI: 10.1109/ICSESS58500.2023.10293038</identifier><language>eng</language><publisher>IEEE</publisher><subject>Annotations ; BertLMHeadModel ; BLIP model ; Complexity theory ; component ; Dai Ethnic Clothing ; Feature extraction ; Image Caption ; Measurement ; Semantics ; Training ; ViT-B</subject><ispartof>2023 IEEE 14th International Conference on Software Engineering and Service Science (ICSESS), 2023, p.212-216</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10293038$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10293038$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Feng, Zuwei</creatorcontrib><creatorcontrib>Wen, Bin</creatorcontrib><creatorcontrib>Deng, Hongfei</creatorcontrib><title>Image Caption Generation for Dai Ethnic Clothing Based on ViT-B and BertLMHeadModel</title><title>2023 IEEE 14th International Conference on Software Engineering and Service Science (ICSESS)</title><addtitle>ICSESS</addtitle><description>Dai ethnic clothing is unique and distinct subset of minority apparel within China, impressing with their rich patterns and unique styles. Yet, due to the complexity and diversity of Dai clothing, providing accurate captions of these outfits is a formidable challenge. In an effort to address this issue, this study utilizes ViT-B from the pre-trained model BLIP to perform feature extraction on images of Dai clothing, thereby converting them into a sequence of feature vectors. Subsequently, these vectors are decoded using BertLMHeadModel to generate corresponding text captions, effectively producing captions for the Dai clothing. For the training of our model, we constructed Dai clothing dataset consisting of Dai clothing image dataset and image text annotation dataset. With images and textual descriptions serving as inputs, model parameters are optimized using cross-entropy loss and AdamW. In employing this strategy, our model is capable of extracting key features from images and generating accurate captions, providing detailed captions of the characteristics of Dai clothing with a high degree of semantic accuracy.</description><subject>Annotations</subject><subject>BertLMHeadModel</subject><subject>BLIP model</subject><subject>Complexity theory</subject><subject>component</subject><subject>Dai Ethnic Clothing</subject><subject>Feature extraction</subject><subject>Image Caption</subject><subject>Measurement</subject><subject>Semantics</subject><subject>Training</subject><subject>ViT-B</subject><issn>2327-0594</issn><isbn>9798350336269</isbn><isbn>9798350336252</isbn><isbn>9798350336276</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2023</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNpVkM1OwzAQhA0Iiar0DTiYB0hZe-PYe6ShP5FacUjhWjnJpjVqkyrJhben4ufAaUb6RiPNCPGoYKoU0FOW5vM8N84ATDVonCrQhIDuSkzIkkMDiIk2-lqMNGobgaH45h9L6E5M-v4DAFA5AoUjkWcnv2eZ-vMQ2kYuueHOf9u67eSLD3I-HJpQyvTYDofQ7OXM91zJS-A9bKOZ9E0lZ9wN682KfbVpKz7ei9vaH3ue_OpYvC3m23QVrV-XWfq8joJSNEQ1JK5Acg4TS1D4IrGqTsCyvqz0oEvQsUJL1gJyEZdofElgyiou6EIrHIuHn97AzLtzF06--9z9_YJfbuFS7g</recordid><startdate>20231017</startdate><enddate>20231017</enddate><creator>Feng, Zuwei</creator><creator>Wen, Bin</creator><creator>Deng, Hongfei</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>20231017</creationdate><title>Image Caption Generation for Dai Ethnic Clothing Based on ViT-B and BertLMHeadModel</title><author>Feng, Zuwei ; Wen, Bin ; Deng, Hongfei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i119t-f068b398836790bab671f607e2585a02c02413797703eb4c35ac905cd4b902cd3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Annotations</topic><topic>BertLMHeadModel</topic><topic>BLIP model</topic><topic>Complexity theory</topic><topic>component</topic><topic>Dai Ethnic Clothing</topic><topic>Feature extraction</topic><topic>Image Caption</topic><topic>Measurement</topic><topic>Semantics</topic><topic>Training</topic><topic>ViT-B</topic><toplevel>online_resources</toplevel><creatorcontrib>Feng, Zuwei</creatorcontrib><creatorcontrib>Wen, Bin</creatorcontrib><creatorcontrib>Deng, Hongfei</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Feng, Zuwei</au><au>Wen, Bin</au><au>Deng, Hongfei</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Image Caption Generation for Dai Ethnic Clothing Based on ViT-B and BertLMHeadModel</atitle><btitle>2023 IEEE 14th International Conference on Software Engineering and Service Science (ICSESS)</btitle><stitle>ICSESS</stitle><date>2023-10-17</date><risdate>2023</risdate><spage>212</spage><epage>216</epage><pages>212-216</pages><eissn>2327-0594</eissn><isbn>9798350336269</isbn><eisbn>9798350336252</eisbn><eisbn>9798350336276</eisbn><abstract>Dai ethnic clothing is unique and distinct subset of minority apparel within China, impressing with their rich patterns and unique styles. Yet, due to the complexity and diversity of Dai clothing, providing accurate captions of these outfits is a formidable challenge. In an effort to address this issue, this study utilizes ViT-B from the pre-trained model BLIP to perform feature extraction on images of Dai clothing, thereby converting them into a sequence of feature vectors. Subsequently, these vectors are decoded using BertLMHeadModel to generate corresponding text captions, effectively producing captions for the Dai clothing. For the training of our model, we constructed Dai clothing dataset consisting of Dai clothing image dataset and image text annotation dataset. With images and textual descriptions serving as inputs, model parameters are optimized using cross-entropy loss and AdamW. In employing this strategy, our model is capable of extracting key features from images and generating accurate captions, providing detailed captions of the characteristics of Dai clothing with a high degree of semantic accuracy.</abstract><pub>IEEE</pub><doi>10.1109/ICSESS58500.2023.10293038</doi><tpages>5</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISBN: 9798350336269
ispartof 2023 IEEE 14th International Conference on Software Engineering and Service Science (ICSESS), 2023, p.212-216
issn 2327-0594
language eng
recordid cdi_ieee_primary_10293038
source IEEE Xplore All Conference Series
subjects Annotations
BertLMHeadModel
BLIP model
Complexity theory
component
Dai Ethnic Clothing
Feature extraction
Image Caption
Measurement
Semantics
Training
ViT-B
title Image Caption Generation for Dai Ethnic Clothing Based on ViT-B and BertLMHeadModel
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T00%3A53%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Image%20Caption%20Generation%20for%20Dai%20Ethnic%20Clothing%20Based%20on%20ViT-B%20and%20BertLMHeadModel&rft.btitle=2023%20IEEE%2014th%20International%20Conference%20on%20Software%20Engineering%20and%20Service%20Science%20(ICSESS)&rft.au=Feng,%20Zuwei&rft.date=2023-10-17&rft.spage=212&rft.epage=216&rft.pages=212-216&rft.eissn=2327-0594&rft.isbn=9798350336269&rft_id=info:doi/10.1109/ICSESS58500.2023.10293038&rft.eisbn=9798350336252&rft.eisbn_list=9798350336276&rft_dat=%3Cieee_CHZPO%3E10293038%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i119t-f068b398836790bab671f607e2585a02c02413797703eb4c35ac905cd4b902cd3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10293038&rfr_iscdi=true