Loading…
Image Caption Generation for Dai Ethnic Clothing Based on ViT-B and BertLMHeadModel
Dai ethnic clothing is unique and distinct subset of minority apparel within China, impressing with their rich patterns and unique styles. Yet, due to the complexity and diversity of Dai clothing, providing accurate captions of these outfits is a formidable challenge. In an effort to address this is...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 216 |
container_issue | |
container_start_page | 212 |
container_title | |
container_volume | |
creator | Feng, Zuwei Wen, Bin Deng, Hongfei |
description | Dai ethnic clothing is unique and distinct subset of minority apparel within China, impressing with their rich patterns and unique styles. Yet, due to the complexity and diversity of Dai clothing, providing accurate captions of these outfits is a formidable challenge. In an effort to address this issue, this study utilizes ViT-B from the pre-trained model BLIP to perform feature extraction on images of Dai clothing, thereby converting them into a sequence of feature vectors. Subsequently, these vectors are decoded using BertLMHeadModel to generate corresponding text captions, effectively producing captions for the Dai clothing. For the training of our model, we constructed Dai clothing dataset consisting of Dai clothing image dataset and image text annotation dataset. With images and textual descriptions serving as inputs, model parameters are optimized using cross-entropy loss and AdamW. In employing this strategy, our model is capable of extracting key features from images and generating accurate captions, providing detailed captions of the characteristics of Dai clothing with a high degree of semantic accuracy. |
doi_str_mv | 10.1109/ICSESS58500.2023.10293038 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10293038</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10293038</ieee_id><sourcerecordid>10293038</sourcerecordid><originalsourceid>FETCH-LOGICAL-i119t-f068b398836790bab671f607e2585a02c02413797703eb4c35ac905cd4b902cd3</originalsourceid><addsrcrecordid>eNpVkM1OwzAQhA0Iiar0DTiYB0hZe-PYe6ShP5FacUjhWjnJpjVqkyrJhben4ufAaUb6RiPNCPGoYKoU0FOW5vM8N84ATDVonCrQhIDuSkzIkkMDiIk2-lqMNGobgaH45h9L6E5M-v4DAFA5AoUjkWcnv2eZ-vMQ2kYuueHOf9u67eSLD3I-HJpQyvTYDofQ7OXM91zJS-A9bKOZ9E0lZ9wN682KfbVpKz7ei9vaH3ue_OpYvC3m23QVrV-XWfq8joJSNEQ1JK5Acg4TS1D4IrGqTsCyvqz0oEvQsUJL1gJyEZdofElgyiou6EIrHIuHn97AzLtzF06--9z9_YJfbuFS7g</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Image Caption Generation for Dai Ethnic Clothing Based on ViT-B and BertLMHeadModel</title><source>IEEE Xplore All Conference Series</source><creator>Feng, Zuwei ; Wen, Bin ; Deng, Hongfei</creator><creatorcontrib>Feng, Zuwei ; Wen, Bin ; Deng, Hongfei</creatorcontrib><description>Dai ethnic clothing is unique and distinct subset of minority apparel within China, impressing with their rich patterns and unique styles. Yet, due to the complexity and diversity of Dai clothing, providing accurate captions of these outfits is a formidable challenge. In an effort to address this issue, this study utilizes ViT-B from the pre-trained model BLIP to perform feature extraction on images of Dai clothing, thereby converting them into a sequence of feature vectors. Subsequently, these vectors are decoded using BertLMHeadModel to generate corresponding text captions, effectively producing captions for the Dai clothing. For the training of our model, we constructed Dai clothing dataset consisting of Dai clothing image dataset and image text annotation dataset. With images and textual descriptions serving as inputs, model parameters are optimized using cross-entropy loss and AdamW. In employing this strategy, our model is capable of extracting key features from images and generating accurate captions, providing detailed captions of the characteristics of Dai clothing with a high degree of semantic accuracy.</description><identifier>ISBN: 9798350336269</identifier><identifier>EISSN: 2327-0594</identifier><identifier>EISBN: 9798350336252</identifier><identifier>EISBN: 9798350336276</identifier><identifier>DOI: 10.1109/ICSESS58500.2023.10293038</identifier><language>eng</language><publisher>IEEE</publisher><subject>Annotations ; BertLMHeadModel ; BLIP model ; Complexity theory ; component ; Dai Ethnic Clothing ; Feature extraction ; Image Caption ; Measurement ; Semantics ; Training ; ViT-B</subject><ispartof>2023 IEEE 14th International Conference on Software Engineering and Service Science (ICSESS), 2023, p.212-216</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10293038$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10293038$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Feng, Zuwei</creatorcontrib><creatorcontrib>Wen, Bin</creatorcontrib><creatorcontrib>Deng, Hongfei</creatorcontrib><title>Image Caption Generation for Dai Ethnic Clothing Based on ViT-B and BertLMHeadModel</title><title>2023 IEEE 14th International Conference on Software Engineering and Service Science (ICSESS)</title><addtitle>ICSESS</addtitle><description>Dai ethnic clothing is unique and distinct subset of minority apparel within China, impressing with their rich patterns and unique styles. Yet, due to the complexity and diversity of Dai clothing, providing accurate captions of these outfits is a formidable challenge. In an effort to address this issue, this study utilizes ViT-B from the pre-trained model BLIP to perform feature extraction on images of Dai clothing, thereby converting them into a sequence of feature vectors. Subsequently, these vectors are decoded using BertLMHeadModel to generate corresponding text captions, effectively producing captions for the Dai clothing. For the training of our model, we constructed Dai clothing dataset consisting of Dai clothing image dataset and image text annotation dataset. With images and textual descriptions serving as inputs, model parameters are optimized using cross-entropy loss and AdamW. In employing this strategy, our model is capable of extracting key features from images and generating accurate captions, providing detailed captions of the characteristics of Dai clothing with a high degree of semantic accuracy.</description><subject>Annotations</subject><subject>BertLMHeadModel</subject><subject>BLIP model</subject><subject>Complexity theory</subject><subject>component</subject><subject>Dai Ethnic Clothing</subject><subject>Feature extraction</subject><subject>Image Caption</subject><subject>Measurement</subject><subject>Semantics</subject><subject>Training</subject><subject>ViT-B</subject><issn>2327-0594</issn><isbn>9798350336269</isbn><isbn>9798350336252</isbn><isbn>9798350336276</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2023</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNpVkM1OwzAQhA0Iiar0DTiYB0hZe-PYe6ShP5FacUjhWjnJpjVqkyrJhben4ufAaUb6RiPNCPGoYKoU0FOW5vM8N84ATDVonCrQhIDuSkzIkkMDiIk2-lqMNGobgaH45h9L6E5M-v4DAFA5AoUjkWcnv2eZ-vMQ2kYuueHOf9u67eSLD3I-HJpQyvTYDofQ7OXM91zJS-A9bKOZ9E0lZ9wN682KfbVpKz7ei9vaH3ue_OpYvC3m23QVrV-XWfq8joJSNEQ1JK5Acg4TS1D4IrGqTsCyvqz0oEvQsUJL1gJyEZdofElgyiou6EIrHIuHn97AzLtzF06--9z9_YJfbuFS7g</recordid><startdate>20231017</startdate><enddate>20231017</enddate><creator>Feng, Zuwei</creator><creator>Wen, Bin</creator><creator>Deng, Hongfei</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>20231017</creationdate><title>Image Caption Generation for Dai Ethnic Clothing Based on ViT-B and BertLMHeadModel</title><author>Feng, Zuwei ; Wen, Bin ; Deng, Hongfei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i119t-f068b398836790bab671f607e2585a02c02413797703eb4c35ac905cd4b902cd3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Annotations</topic><topic>BertLMHeadModel</topic><topic>BLIP model</topic><topic>Complexity theory</topic><topic>component</topic><topic>Dai Ethnic Clothing</topic><topic>Feature extraction</topic><topic>Image Caption</topic><topic>Measurement</topic><topic>Semantics</topic><topic>Training</topic><topic>ViT-B</topic><toplevel>online_resources</toplevel><creatorcontrib>Feng, Zuwei</creatorcontrib><creatorcontrib>Wen, Bin</creatorcontrib><creatorcontrib>Deng, Hongfei</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Feng, Zuwei</au><au>Wen, Bin</au><au>Deng, Hongfei</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Image Caption Generation for Dai Ethnic Clothing Based on ViT-B and BertLMHeadModel</atitle><btitle>2023 IEEE 14th International Conference on Software Engineering and Service Science (ICSESS)</btitle><stitle>ICSESS</stitle><date>2023-10-17</date><risdate>2023</risdate><spage>212</spage><epage>216</epage><pages>212-216</pages><eissn>2327-0594</eissn><isbn>9798350336269</isbn><eisbn>9798350336252</eisbn><eisbn>9798350336276</eisbn><abstract>Dai ethnic clothing is unique and distinct subset of minority apparel within China, impressing with their rich patterns and unique styles. Yet, due to the complexity and diversity of Dai clothing, providing accurate captions of these outfits is a formidable challenge. In an effort to address this issue, this study utilizes ViT-B from the pre-trained model BLIP to perform feature extraction on images of Dai clothing, thereby converting them into a sequence of feature vectors. Subsequently, these vectors are decoded using BertLMHeadModel to generate corresponding text captions, effectively producing captions for the Dai clothing. For the training of our model, we constructed Dai clothing dataset consisting of Dai clothing image dataset and image text annotation dataset. With images and textual descriptions serving as inputs, model parameters are optimized using cross-entropy loss and AdamW. In employing this strategy, our model is capable of extracting key features from images and generating accurate captions, providing detailed captions of the characteristics of Dai clothing with a high degree of semantic accuracy.</abstract><pub>IEEE</pub><doi>10.1109/ICSESS58500.2023.10293038</doi><tpages>5</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISBN: 9798350336269 |
ispartof | 2023 IEEE 14th International Conference on Software Engineering and Service Science (ICSESS), 2023, p.212-216 |
issn | 2327-0594 |
language | eng |
recordid | cdi_ieee_primary_10293038 |
source | IEEE Xplore All Conference Series |
subjects | Annotations BertLMHeadModel BLIP model Complexity theory component Dai Ethnic Clothing Feature extraction Image Caption Measurement Semantics Training ViT-B |
title | Image Caption Generation for Dai Ethnic Clothing Based on ViT-B and BertLMHeadModel |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T00%3A53%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Image%20Caption%20Generation%20for%20Dai%20Ethnic%20Clothing%20Based%20on%20ViT-B%20and%20BertLMHeadModel&rft.btitle=2023%20IEEE%2014th%20International%20Conference%20on%20Software%20Engineering%20and%20Service%20Science%20(ICSESS)&rft.au=Feng,%20Zuwei&rft.date=2023-10-17&rft.spage=212&rft.epage=216&rft.pages=212-216&rft.eissn=2327-0594&rft.isbn=9798350336269&rft_id=info:doi/10.1109/ICSESS58500.2023.10293038&rft.eisbn=9798350336252&rft.eisbn_list=9798350336276&rft_dat=%3Cieee_CHZPO%3E10293038%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i119t-f068b398836790bab671f607e2585a02c02413797703eb4c35ac905cd4b902cd3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10293038&rfr_iscdi=true |