Loading…

Image Caption Generation for Dai Ethnic Clothing Based on ViT-B and BertLMHeadModel

Dai ethnic clothing is unique and distinct subset of minority apparel within China, impressing with their rich patterns and unique styles. Yet, due to the complexity and diversity of Dai clothing, providing accurate captions of these outfits is a formidable challenge. In an effort to address this is...

Full description

Saved in:

Bibliographic Details
Main Authors:	Feng, Zuwei, Wen, Bin, Deng, Hongfei
Format:	Conference Proceeding
Language:	English
Subjects:	Annotations BertLMHeadModel BLIP model Complexity theory component Dai Ethnic Clothing Feature extraction Image Caption Measurement Semantics Training ViT-B
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	216
container_issue
container_start_page	212
container_title
container_volume
creator	Feng, Zuwei Wen, Bin Deng, Hongfei
description	Dai ethnic clothing is unique and distinct subset of minority apparel within China, impressing with their rich patterns and unique styles. Yet, due to the complexity and diversity of Dai clothing, providing accurate captions of these outfits is a formidable challenge. In an effort to address this issue, this study utilizes ViT-B from the pre-trained model BLIP to perform feature extraction on images of Dai clothing, thereby converting them into a sequence of feature vectors. Subsequently, these vectors are decoded using BertLMHeadModel to generate corresponding text captions, effectively producing captions for the Dai clothing. For the training of our model, we constructed Dai clothing dataset consisting of Dai clothing image dataset and image text annotation dataset. With images and textual descriptions serving as inputs, model parameters are optimized using cross-entropy loss and AdamW. In employing this strategy, our model is capable of extracting key features from images and generating accurate captions, providing detailed captions of the characteristics of Dai clothing with a high degree of semantic accuracy.
doi_str_mv	10.1109/ICSESS58500.2023.10293038
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10293038</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10293038</ieee_id><sourcerecordid>10293038</sourcerecordid><originalsourceid>FETCH-LOGICAL-i119t-f068b398836790bab671f607e2585a02c02413797703eb4c35ac905cd4b902cd3</originalsourceid><addsrcrecordid>eNpVkM1OwzAQhA0Iiar0DTiYB0hZe-PYe6ShP5FacUjhWjnJpjVqkyrJhben4ufAaUb6RiPNCPGoYKoU0FOW5vM8N84ATDVonCrQhIDuSkzIkkMDiIk2-lqMNGobgaH45h9L6E5M-v4DAFA5AoUjkWcnv2eZ-vMQ2kYuueHOf9u67eSLD3I-HJpQyvTYDofQ7OXM91zJS-A9bKOZ9E0lZ9wN682KfbVpKz7ei9vaH3ue_OpYvC3m23QVrV-XWfq8joJSNEQ1JK5Acg4TS1D4IrGqTsCyvqz0oEvQsUJL1gJyEZdofElgyiou6EIrHIuHn97AzLtzF06--9z9_YJfbuFS7g</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Image Caption Generation for Dai Ethnic Clothing Based on ViT-B and BertLMHeadModel</title><source>IEEE Xplore All Conference Series</source><creator>Feng, Zuwei ; Wen, Bin ; Deng, Hongfei</creator><creatorcontrib>Feng, Zuwei ; Wen, Bin ; Deng, Hongfei</creatorcontrib><description>Dai ethnic clothing is unique and distinct subset of minority apparel within China, impressing with their rich patterns and unique styles. Yet, due to the complexity and diversity of Dai clothing, providing accurate captions of these outfits is a formidable challenge. In an effort to address this issue, this study utilizes ViT-B from the pre-trained model BLIP to perform feature extraction on images of Dai clothing, thereby converting them into a sequence of feature vectors. Subsequently, these vectors are decoded using BertLMHeadModel to generate corresponding text captions, effectively producing captions for the Dai clothing. For the training of our model, we constructed Dai clothing dataset consisting of Dai clothing image dataset and image text annotation dataset. With images and textual descriptions serving as inputs, model parameters are optimized using cross-entropy loss and AdamW. In employing this strategy, our model is capable of extracting key features from images and generating accurate captions, providing detailed captions of the characteristics of Dai clothing with a high degree of semantic accuracy.</description><identifier>ISBN: 9798350336269</identifier><identifier>EISSN: 2327-0594</identifier><identifier>EISBN: 9798350336252</identifier><identifier>EISBN: 9798350336276</identifier><identifier>DOI: 10.1109/ICSESS58500.2023.10293038</identifier><language>eng</language><publisher>IEEE</publisher><subject>Annotations ; BertLMHeadModel ; BLIP model ; Complexity theory ; component ; Dai Ethnic Clothing ; Feature extraction ; Image Caption ; Measurement ; Semantics ; Training ; ViT-B</subject><ispartof>2023 IEEE 14th International Conference on Software Engineering and Service Science (ICSESS), 2023, p.212-216</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10293038$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10293038$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Feng, Zuwei</creatorcontrib><creatorcontrib>Wen, Bin</creatorcontrib><creatorcontrib>Deng, Hongfei</creatorcontrib><title>Image Caption Generation for Dai Ethnic Clothing Based on ViT-B and BertLMHeadModel</title><title>2023 IEEE 14th International Conference on Software Engineering and Service Science (ICSESS)</title><addtitle>ICSESS</addtitle><description>Dai ethnic clothing is unique and distinct subset of minority apparel within China, impressing with their rich patterns and unique styles. Yet, due to the complexity and diversity of Dai clothing, providing accurate captions of these outfits is a formidable challenge. In an effort to address this issue, this study utilizes ViT-B from the pre-trained model BLIP to perform feature extraction on images of Dai clothing, thereby converting them into a sequence of feature vectors. Subsequently, these vectors are decoded using BertLMHeadModel to generate corresponding text captions, effectively producing captions for the Dai clothing. For the training of our model, we constructed Dai clothing dataset consisting of Dai clothing image dataset and image text annotation dataset. With images and textual descriptions serving as inputs, model parameters are optimized using cross-entropy loss and AdamW. In employing this strategy, our model is capable of extracting key features from images and generating accurate captions, providing detailed captions of the characteristics of Dai clothing with a high degree of semantic accuracy.</description><subject>Annotations</subject><subject>BertLMHeadModel</subject><subject>BLIP model</subject><subject>Complexity theory</subject><subject>component</subject><subject>Dai Ethnic Clothing</subject><subject>Feature extraction</subject><subject>Image Caption</subject><subject>Measurement</subject><subject>Semantics</subject><subject>Training</subject><subject>ViT-B</subject><issn>2327-0594</issn><isbn>9798350336269</isbn><isbn>9798350336252</isbn><isbn>9798350336276</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2023</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNpVkM1OwzAQhA0Iiar0DTiYB0hZe-PYe6ShP5FacUjhWjnJpjVqkyrJhben4ufAaUb6RiPNCPGoYKoU0FOW5vM8N84ATDVonCrQhIDuSkzIkkMDiIk2-lqMNGobgaH45h9L6E5M-v4DAFA5AoUjkWcnv2eZ-vMQ2kYuueHOf9u67eSLD3I-HJpQyvTYDofQ7OXM91zJS-A9bKOZ9E0lZ9wN682KfbVpKz7ei9vaH3ue_OpYvC3m23QVrV-XWfq8joJSNEQ1JK5Acg4TS1D4IrGqTsCyvqz0oEvQsUJL1gJyEZdofElgyiou6EIrHIuHn97AzLtzF06--9z9_YJfbuFS7g</recordid><startdate>20231017</startdate><enddate>20231017</enddate><creator>Feng, Zuwei</creator><creator>Wen, Bin</creator><creator>Deng, Hongfei</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>20231017</creationdate><title>Image Caption Generation for Dai Ethnic Clothing Based on ViT-B and BertLMHeadModel</title><author>Feng, Zuwei ; Wen, Bin ; Deng, Hongfei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i119t-f068b398836790bab671f607e2585a02c02413797703eb4c35ac905cd4b902cd3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Annotations</topic><topic>BertLMHeadModel</topic><topic>BLIP model</topic><topic>Complexity theory</topic><topic>component</topic><topic>Dai Ethnic Clothing</topic><topic>Feature extraction</topic><topic>Image Caption</topic><topic>Measurement</topic><topic>Semantics</topic><topic>Training</topic><topic>ViT-B</topic><toplevel>online_resources</toplevel><creatorcontrib>Feng, Zuwei</creatorcontrib><creatorcontrib>Wen, Bin</creatorcontrib><creatorcontrib>Deng, Hongfei</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Feng, Zuwei</au><au>Wen, Bin</au><au>Deng, Hongfei</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Image Caption Generation for Dai Ethnic Clothing Based on ViT-B and BertLMHeadModel</atitle><btitle>2023 IEEE 14th International Conference on Software Engineering and Service Science (ICSESS)</btitle><stitle>ICSESS</stitle><date>2023-10-17</date><risdate>2023</risdate><spage>212</spage><epage>216</epage><pages>212-216</pages><eissn>2327-0594</eissn><isbn>9798350336269</isbn><eisbn>9798350336252</eisbn><eisbn>9798350336276</eisbn><abstract>Dai ethnic clothing is unique and distinct subset of minority apparel within China, impressing with their rich patterns and unique styles. Yet, due to the complexity and diversity of Dai clothing, providing accurate captions of these outfits is a formidable challenge. In an effort to address this issue, this study utilizes ViT-B from the pre-trained model BLIP to perform feature extraction on images of Dai clothing, thereby converting them into a sequence of feature vectors. Subsequently, these vectors are decoded using BertLMHeadModel to generate corresponding text captions, effectively producing captions for the Dai clothing. For the training of our model, we constructed Dai clothing dataset consisting of Dai clothing image dataset and image text annotation dataset. With images and textual descriptions serving as inputs, model parameters are optimized using cross-entropy loss and AdamW. In employing this strategy, our model is capable of extracting key features from images and generating accurate captions, providing detailed captions of the characteristics of Dai clothing with a high degree of semantic accuracy.</abstract><pub>IEEE</pub><doi>10.1109/ICSESS58500.2023.10293038</doi><tpages>5</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISBN: 9798350336269
ispartof	2023 IEEE 14th International Conference on Software Engineering and Service Science (ICSESS), 2023, p.212-216
issn	2327-0594
language	eng
recordid	cdi_ieee_primary_10293038
source	IEEE Xplore All Conference Series
subjects	Annotations BertLMHeadModel BLIP model Complexity theory component Dai Ethnic Clothing Feature extraction Image Caption Measurement Semantics Training ViT-B
title	Image Caption Generation for Dai Ethnic Clothing Based on ViT-B and BertLMHeadModel
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T00%3A53%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Image%20Caption%20Generation%20for%20Dai%20Ethnic%20Clothing%20Based%20on%20ViT-B%20and%20BertLMHeadModel&rft.btitle=2023%20IEEE%2014th%20International%20Conference%20on%20Software%20Engineering%20and%20Service%20Science%20(ICSESS)&rft.au=Feng,%20Zuwei&rft.date=2023-10-17&rft.spage=212&rft.epage=216&rft.pages=212-216&rft.eissn=2327-0594&rft.isbn=9798350336269&rft_id=info:doi/10.1109/ICSESS58500.2023.10293038&rft.eisbn=9798350336252&rft.eisbn_list=9798350336276&rft_dat=%3Cieee_CHZPO%3E10293038%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i119t-f068b398836790bab671f607e2585a02c02413797703eb4c35ac905cd4b902cd3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10293038&rfr_iscdi=true