Loading…

Thangka image captioning model with Salient Attention and Local Interaction Aggregator

Thangka image captioning aims to automatically generate accurate and complete sentences that describe the main content of Thangka images. However, existing methods fall short in capturing the features of the core deity regions and the surrounding background details of Thangka images, and they signif...

Full description

Saved in:

Bibliographic Details
Published in:	Heritage science 2024-11, Vol.12 (1), p.407-21, Article 407
Main Authors:	Hu, Wenjin, Zhang, Fujun, Zhao, Yinqiu
Format:	Article
Language:	English
Subjects:	Algorithms Chemistry and Materials Science Cultural heritage Culture Datasets Deep learning Deities Design Digital preservation Dual-Branch Salient Attention Image captioning Local Interaction Aggregator Materials Science Qualitative analysis Science Semantics Thangka
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites	cdi_FETCH-LOGICAL-c310t-d8e582c73aca6e78feed107dc973b677523efe07abc36fbf158e1e83e0ad009d3
container_end_page	21
container_issue	1
container_start_page	407
container_title	Heritage science
container_volume	12
creator	Hu, Wenjin Zhang, Fujun Zhao, Yinqiu
description	Thangka image captioning aims to automatically generate accurate and complete sentences that describe the main content of Thangka images. However, existing methods fall short in capturing the features of the core deity regions and the surrounding background details of Thangka images, and they significantly lack an understanding of local actions and interactions within the images. To address these issues, this paper proposes a Thangka image captioning model based on Salient Attention and Local Interaction Aggregator (SALIA). The model is designed with a Dual-Branch Salient Attention Module (DBSA) to accurately capture the expressions, decorations of the deity, and descriptive background elements, and it introduces a Local Interaction Aggregator (LIA) to achieve detailed analysis of the characters’ actions, facial expressions, and the complex interactions with surrounding elements in Thangka images. Experimental results show that SALIA outperforms other state-of-the-art methods in both qualitative and quantitative evaluations of Thangka image captioning, achieving BLEU4: 94.0%, ROUGE_L: 95.0%, and CIDEr: 909.8% on the D-Thangka dataset, and BLEU4: 22.2% and ROUGE_L: 47.2% on the Flickr8k dataset.
doi_str_mv	10.1186/s40494-024-01518-5
format	article
fullrecord	<record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_45485e3d909f4899b4b0638ffcab7d85</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_45485e3d909f4899b4b0638ffcab7d85</doaj_id><sourcerecordid>3130571205</sourcerecordid><originalsourceid>FETCH-LOGICAL-c310t-d8e582c73aca6e78feed107dc973b677523efe07abc36fbf158e1e83e0ad009d3</originalsourceid><addsrcrecordid>eNp9UU1LxDAQLaKgqH_AU8BzNWmSJj0u4sfCggc_rmGaTGrX2qxpRPz3xq2oJwfCDDPvvWTyiuKE0TPGdH0-CSoaUdIqHyaZLuVOcVBRSUslhNz9U-8Xx9O0pjmahle1Oige759g7J6B9C_QIbGwSX0Y-7EjL8HhQN779ETuYOhxTGSRUk55TmB0ZBUsDGQ5Joxgt91F10XsIIV4VOx5GCY8_s6HxcPV5f3FTbm6vV5eLFal5Yym0mmUurKKg4UalfaIjlHlbKN4WyslK44eqYLW8tq3nkmNDDVHCi7v4PhhsZx1XYC12cS8RfwwAXqzbYTYGYiptwMaIYWWyF1DGy9007SipTXX3ltoldMya53OWpsYXt9wSmYd3uKYn28441Qqlv8xo6oZZWOYpoj-51ZGzZcdZrbDZDvM1g7zReIzacrgscP4K_0P6xOBCY1f</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3130571205</pqid></control><display><type>article</type><title>Thangka image captioning model with Salient Attention and Local Interaction Aggregator</title><source>Springer Nature - SpringerLink Journals - Fully Open Access </source><source>Publicly Available Content (ProQuest)</source><creator>Hu, Wenjin ; Zhang, Fujun ; Zhao, Yinqiu</creator><creatorcontrib>Hu, Wenjin ; Zhang, Fujun ; Zhao, Yinqiu</creatorcontrib><description>Thangka image captioning aims to automatically generate accurate and complete sentences that describe the main content of Thangka images. However, existing methods fall short in capturing the features of the core deity regions and the surrounding background details of Thangka images, and they significantly lack an understanding of local actions and interactions within the images. To address these issues, this paper proposes a Thangka image captioning model based on Salient Attention and Local Interaction Aggregator (SALIA). The model is designed with a Dual-Branch Salient Attention Module (DBSA) to accurately capture the expressions, decorations of the deity, and descriptive background elements, and it introduces a Local Interaction Aggregator (LIA) to achieve detailed analysis of the characters’ actions, facial expressions, and the complex interactions with surrounding elements in Thangka images. Experimental results show that SALIA outperforms other state-of-the-art methods in both qualitative and quantitative evaluations of Thangka image captioning, achieving BLEU4: 94.0%, ROUGE_L: 95.0%, and CIDEr: 909.8% on the D-Thangka dataset, and BLEU4: 22.2% and ROUGE_L: 47.2% on the Flickr8k dataset.</description><identifier>ISSN: 2050-7445</identifier><identifier>EISSN: 2050-7445</identifier><identifier>DOI: 10.1186/s40494-024-01518-5</identifier><language>eng</language><publisher>Cham: Springer International Publishing</publisher><subject>Algorithms ; Chemistry and Materials Science ; Cultural heritage ; Culture ; Datasets ; Deep learning ; Deities ; Design ; Digital preservation ; Dual-Branch Salient Attention ; Image captioning ; Local Interaction Aggregator ; Materials Science ; Qualitative analysis ; Science ; Semantics ; Thangka</subject><ispartof>Heritage science, 2024-11, Vol.12 (1), p.407-21, Article 407</ispartof><rights>The Author(s) 2024</rights><rights>The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c310t-d8e582c73aca6e78feed107dc973b677523efe07abc36fbf158e1e83e0ad009d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/3130571205/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/3130571205?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,25753,27924,27925,37012,44590,75126</link.rule.ids></links><search><creatorcontrib>Hu, Wenjin</creatorcontrib><creatorcontrib>Zhang, Fujun</creatorcontrib><creatorcontrib>Zhao, Yinqiu</creatorcontrib><title>Thangka image captioning model with Salient Attention and Local Interaction Aggregator</title><title>Heritage science</title><addtitle>Herit Sci</addtitle><description>Thangka image captioning aims to automatically generate accurate and complete sentences that describe the main content of Thangka images. However, existing methods fall short in capturing the features of the core deity regions and the surrounding background details of Thangka images, and they significantly lack an understanding of local actions and interactions within the images. To address these issues, this paper proposes a Thangka image captioning model based on Salient Attention and Local Interaction Aggregator (SALIA). The model is designed with a Dual-Branch Salient Attention Module (DBSA) to accurately capture the expressions, decorations of the deity, and descriptive background elements, and it introduces a Local Interaction Aggregator (LIA) to achieve detailed analysis of the characters’ actions, facial expressions, and the complex interactions with surrounding elements in Thangka images. Experimental results show that SALIA outperforms other state-of-the-art methods in both qualitative and quantitative evaluations of Thangka image captioning, achieving BLEU4: 94.0%, ROUGE_L: 95.0%, and CIDEr: 909.8% on the D-Thangka dataset, and BLEU4: 22.2% and ROUGE_L: 47.2% on the Flickr8k dataset.</description><subject>Algorithms</subject><subject>Chemistry and Materials Science</subject><subject>Cultural heritage</subject><subject>Culture</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>Deities</subject><subject>Design</subject><subject>Digital preservation</subject><subject>Dual-Branch Salient Attention</subject><subject>Image captioning</subject><subject>Local Interaction Aggregator</subject><subject>Materials Science</subject><subject>Qualitative analysis</subject><subject>Science</subject><subject>Semantics</subject><subject>Thangka</subject><issn>2050-7445</issn><issn>2050-7445</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNp9UU1LxDAQLaKgqH_AU8BzNWmSJj0u4sfCggc_rmGaTGrX2qxpRPz3xq2oJwfCDDPvvWTyiuKE0TPGdH0-CSoaUdIqHyaZLuVOcVBRSUslhNz9U-8Xx9O0pjmahle1Oige759g7J6B9C_QIbGwSX0Y-7EjL8HhQN779ETuYOhxTGSRUk55TmB0ZBUsDGQ5Joxgt91F10XsIIV4VOx5GCY8_s6HxcPV5f3FTbm6vV5eLFal5Yym0mmUurKKg4UalfaIjlHlbKN4WyslK44eqYLW8tq3nkmNDDVHCi7v4PhhsZx1XYC12cS8RfwwAXqzbYTYGYiptwMaIYWWyF1DGy9007SipTXX3ltoldMya53OWpsYXt9wSmYd3uKYn28441Qqlv8xo6oZZWOYpoj-51ZGzZcdZrbDZDvM1g7zReIzacrgscP4K_0P6xOBCY1f</recordid><startdate>20241120</startdate><enddate>20241120</enddate><creator>Hu, Wenjin</creator><creator>Zhang, Fujun</creator><creator>Zhao, Yinqiu</creator><general>Springer International Publishing</general><general>Springer Nature B.V</general><general>SpringerOpen</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>KB.</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>DOA</scope></search><sort><creationdate>20241120</creationdate><title>Thangka image captioning model with Salient Attention and Local Interaction Aggregator</title><author>Hu, Wenjin ; Zhang, Fujun ; Zhao, Yinqiu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c310t-d8e582c73aca6e78feed107dc973b677523efe07abc36fbf158e1e83e0ad009d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Chemistry and Materials Science</topic><topic>Cultural heritage</topic><topic>Culture</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>Deities</topic><topic>Design</topic><topic>Digital preservation</topic><topic>Dual-Branch Salient Attention</topic><topic>Image captioning</topic><topic>Local Interaction Aggregator</topic><topic>Materials Science</topic><topic>Qualitative analysis</topic><topic>Science</topic><topic>Semantics</topic><topic>Thangka</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hu, Wenjin</creatorcontrib><creatorcontrib>Zhang, Fujun</creatorcontrib><creatorcontrib>Zhao, Yinqiu</creatorcontrib><collection>Springer_OA刊</collection><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>Materials Science Database</collection><collection>Materials science collection</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Open Access: DOAJ - Directory of Open Access Journals</collection><jtitle>Heritage science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hu, Wenjin</au><au>Zhang, Fujun</au><au>Zhao, Yinqiu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Thangka image captioning model with Salient Attention and Local Interaction Aggregator</atitle><jtitle>Heritage science</jtitle><stitle>Herit Sci</stitle><date>2024-11-20</date><risdate>2024</risdate><volume>12</volume><issue>1</issue><spage>407</spage><epage>21</epage><pages>407-21</pages><artnum>407</artnum><issn>2050-7445</issn><eissn>2050-7445</eissn><abstract>Thangka image captioning aims to automatically generate accurate and complete sentences that describe the main content of Thangka images. However, existing methods fall short in capturing the features of the core deity regions and the surrounding background details of Thangka images, and they significantly lack an understanding of local actions and interactions within the images. To address these issues, this paper proposes a Thangka image captioning model based on Salient Attention and Local Interaction Aggregator (SALIA). The model is designed with a Dual-Branch Salient Attention Module (DBSA) to accurately capture the expressions, decorations of the deity, and descriptive background elements, and it introduces a Local Interaction Aggregator (LIA) to achieve detailed analysis of the characters’ actions, facial expressions, and the complex interactions with surrounding elements in Thangka images. Experimental results show that SALIA outperforms other state-of-the-art methods in both qualitative and quantitative evaluations of Thangka image captioning, achieving BLEU4: 94.0%, ROUGE_L: 95.0%, and CIDEr: 909.8% on the D-Thangka dataset, and BLEU4: 22.2% and ROUGE_L: 47.2% on the Flickr8k dataset.</abstract><cop>Cham</cop><pub>Springer International Publishing</pub><doi>10.1186/s40494-024-01518-5</doi><tpages>21</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2050-7445
ispartof	Heritage science, 2024-11, Vol.12 (1), p.407-21, Article 407
issn	2050-7445 2050-7445
language	eng
recordid	cdi_doaj_primary_oai_doaj_org_article_45485e3d909f4899b4b0638ffcab7d85
source	Springer Nature - SpringerLink Journals - Fully Open Access ; Publicly Available Content (ProQuest)
subjects	Algorithms Chemistry and Materials Science Cultural heritage Culture Datasets Deep learning Deities Design Digital preservation Dual-Branch Salient Attention Image captioning Local Interaction Aggregator Materials Science Qualitative analysis Science Semantics Thangka
title	Thangka image captioning model with Salient Attention and Local Interaction Aggregator
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T05%3A20%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Thangka%20image%20captioning%20model%20with%20Salient%20Attention%20and%20Local%20Interaction%20Aggregator&rft.jtitle=Heritage%20science&rft.au=Hu,%20Wenjin&rft.date=2024-11-20&rft.volume=12&rft.issue=1&rft.spage=407&rft.epage=21&rft.pages=407-21&rft.artnum=407&rft.issn=2050-7445&rft.eissn=2050-7445&rft_id=info:doi/10.1186/s40494-024-01518-5&rft_dat=%3Cproquest_doaj_%3E3130571205%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c310t-d8e582c73aca6e78feed107dc973b677523efe07abc36fbf158e1e83e0ad009d3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3130571205&rft_id=info:pmid/&rfr_iscdi=true