Loading…

Deep Captioning with Attention-Based Visual Concept Transfer Mechanism for Enriching Description

In this paper, we propose a novel deep captioning framework called Attention-based multimodal recurrent neural network with Visual Concept Transfer Mechanism (A-VCTM). There are three advantages of the proposed A-VCTM. (1) A multimodal layer is used to integrate the visual representation and context...

Full description

Saved in:

Bibliographic Details
Published in:	Neural processing letters 2019-10, Vol.50 (2), p.1891-1905
Main Authors:	Zhang, Junxuan, Hu, Haifeng
Format:	Article
Language:	English
Subjects:	Artificial Intelligence Bridge construction Complex Systems Computational Intelligence Computer Science Context Probability distribution Recurrent neural networks Representations Semantics
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c319t-9a1d4d0b0ec02444b86faa1e2f8834f322441ad2bbc0d128d82a28c78a2dba1c3
cites	cdi_FETCH-LOGICAL-c319t-9a1d4d0b0ec02444b86faa1e2f8834f322441ad2bbc0d128d82a28c78a2dba1c3
container_end_page	1905
container_issue	2
container_start_page	1891
container_title	Neural processing letters
container_volume	50
creator	Zhang, Junxuan Hu, Haifeng
description	In this paper, we propose a novel deep captioning framework called Attention-based multimodal recurrent neural network with Visual Concept Transfer Mechanism (A-VCTM). There are three advantages of the proposed A-VCTM. (1) A multimodal layer is used to integrate the visual representation and context representation together, building a bridge that connects context information with visual information directly. (2) An attention mechanism is introduced to lead the model to focus on the regions corresponding to the next word to be generated (3) We propose a visual concept transfer mechanism to generate novel visual concepts and enrich the description sentences. Qualitative and quantitative results on two standard benchmarks, MSCOCO and Flickr30K show the effectiveness and practicability of the proposed A-VCTM framework.
doi_str_mv	10.1007/s11063-019-09978-8
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2918343206</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2918343206</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-9a1d4d0b0ec02444b86faa1e2f8834f322441ad2bbc0d128d82a28c78a2dba1c3</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWKt_wFPAc3QmWbvZY23rB1S8VPEWs9lsm9Jm12SL-O9Nu4I3TzMM7_sMPIRcIlwjQH4TEWEkGGDBoChyyeQRGeBtLliei_fjtIscWDbieErOYlwDpBqHAfmYWtvSiW4713jnl_TLdSs67jrr9xd2p6Ot6JuLO72hk8Yb23Z0EbSPtQ302ZqV9i5uad0EOvPBmdUeMrXRBHdgnpOTWm-ivfidQ_J6P1tMHtn85eFpMp4zI7DoWKGxyioowRrgWZaVclRrjZbXUoqsFjwdUVe8LA1UyGUluebS5FLzqtRoxJBc9dw2NJ87Gzu1bnbBp5eKF5gYgidDQ8L7lAlNjMHWqg1uq8O3QlB7k6o3qZJJdTCpZCqJvhRT2C9t-EP_0_oB9jF3Zw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918343206</pqid></control><display><type>article</type><title>Deep Captioning with Attention-Based Visual Concept Transfer Mechanism for Enriching Description</title><source>Springer Nature</source><creator>Zhang, Junxuan ; Hu, Haifeng</creator><creatorcontrib>Zhang, Junxuan ; Hu, Haifeng</creatorcontrib><description>In this paper, we propose a novel deep captioning framework called Attention-based multimodal recurrent neural network with Visual Concept Transfer Mechanism (A-VCTM). There are three advantages of the proposed A-VCTM. (1) A multimodal layer is used to integrate the visual representation and context representation together, building a bridge that connects context information with visual information directly. (2) An attention mechanism is introduced to lead the model to focus on the regions corresponding to the next word to be generated (3) We propose a visual concept transfer mechanism to generate novel visual concepts and enrich the description sentences. Qualitative and quantitative results on two standard benchmarks, MSCOCO and Flickr30K show the effectiveness and practicability of the proposed A-VCTM framework.</description><identifier>ISSN: 1370-4621</identifier><identifier>EISSN: 1573-773X</identifier><identifier>DOI: 10.1007/s11063-019-09978-8</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Artificial Intelligence ; Bridge construction ; Complex Systems ; Computational Intelligence ; Computer Science ; Context ; Probability distribution ; Recurrent neural networks ; Representations ; Semantics</subject><ispartof>Neural processing letters, 2019-10, Vol.50 (2), p.1891-1905</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2019</rights><rights>Springer Science+Business Media, LLC, part of Springer Nature 2019.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-9a1d4d0b0ec02444b86faa1e2f8834f322441ad2bbc0d128d82a28c78a2dba1c3</citedby><cites>FETCH-LOGICAL-c319t-9a1d4d0b0ec02444b86faa1e2f8834f322441ad2bbc0d128d82a28c78a2dba1c3</cites><orcidid>0000-0002-4884-323X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Zhang, Junxuan</creatorcontrib><creatorcontrib>Hu, Haifeng</creatorcontrib><title>Deep Captioning with Attention-Based Visual Concept Transfer Mechanism for Enriching Description</title><title>Neural processing letters</title><addtitle>Neural Process Lett</addtitle><description>In this paper, we propose a novel deep captioning framework called Attention-based multimodal recurrent neural network with Visual Concept Transfer Mechanism (A-VCTM). There are three advantages of the proposed A-VCTM. (1) A multimodal layer is used to integrate the visual representation and context representation together, building a bridge that connects context information with visual information directly. (2) An attention mechanism is introduced to lead the model to focus on the regions corresponding to the next word to be generated (3) We propose a visual concept transfer mechanism to generate novel visual concepts and enrich the description sentences. Qualitative and quantitative results on two standard benchmarks, MSCOCO and Flickr30K show the effectiveness and practicability of the proposed A-VCTM framework.</description><subject>Artificial Intelligence</subject><subject>Bridge construction</subject><subject>Complex Systems</subject><subject>Computational Intelligence</subject><subject>Computer Science</subject><subject>Context</subject><subject>Probability distribution</subject><subject>Recurrent neural networks</subject><subject>Representations</subject><subject>Semantics</subject><issn>1370-4621</issn><issn>1573-773X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LAzEQhoMoWKt_wFPAc3QmWbvZY23rB1S8VPEWs9lsm9Jm12SL-O9Nu4I3TzMM7_sMPIRcIlwjQH4TEWEkGGDBoChyyeQRGeBtLliei_fjtIscWDbieErOYlwDpBqHAfmYWtvSiW4713jnl_TLdSs67jrr9xd2p6Ot6JuLO72hk8Yb23Z0EbSPtQ302ZqV9i5uad0EOvPBmdUeMrXRBHdgnpOTWm-ivfidQ_J6P1tMHtn85eFpMp4zI7DoWKGxyioowRrgWZaVclRrjZbXUoqsFjwdUVe8LA1UyGUluebS5FLzqtRoxJBc9dw2NJ87Gzu1bnbBp5eKF5gYgidDQ8L7lAlNjMHWqg1uq8O3QlB7k6o3qZJJdTCpZCqJvhRT2C9t-EP_0_oB9jF3Zw</recordid><startdate>20191001</startdate><enddate>20191001</enddate><creator>Zhang, Junxuan</creator><creator>Hu, Haifeng</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PSYQQ</scope><orcidid>https://orcid.org/0000-0002-4884-323X</orcidid></search><sort><creationdate>20191001</creationdate><title>Deep Captioning with Attention-Based Visual Concept Transfer Mechanism for Enriching Description</title><author>Zhang, Junxuan ; Hu, Haifeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-9a1d4d0b0ec02444b86faa1e2f8834f322441ad2bbc0d128d82a28c78a2dba1c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Artificial Intelligence</topic><topic>Bridge construction</topic><topic>Complex Systems</topic><topic>Computational Intelligence</topic><topic>Computer Science</topic><topic>Context</topic><topic>Probability distribution</topic><topic>Recurrent neural networks</topic><topic>Representations</topic><topic>Semantics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Junxuan</creatorcontrib><creatorcontrib>Hu, Haifeng</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest advanced technologies & aerospace journals</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest One Psychology</collection><jtitle>Neural processing letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Junxuan</au><au>Hu, Haifeng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep Captioning with Attention-Based Visual Concept Transfer Mechanism for Enriching Description</atitle><jtitle>Neural processing letters</jtitle><stitle>Neural Process Lett</stitle><date>2019-10-01</date><risdate>2019</risdate><volume>50</volume><issue>2</issue><spage>1891</spage><epage>1905</epage><pages>1891-1905</pages><issn>1370-4621</issn><eissn>1573-773X</eissn><abstract>In this paper, we propose a novel deep captioning framework called Attention-based multimodal recurrent neural network with Visual Concept Transfer Mechanism (A-VCTM). There are three advantages of the proposed A-VCTM. (1) A multimodal layer is used to integrate the visual representation and context representation together, building a bridge that connects context information with visual information directly. (2) An attention mechanism is introduced to lead the model to focus on the regions corresponding to the next word to be generated (3) We propose a visual concept transfer mechanism to generate novel visual concepts and enrich the description sentences. Qualitative and quantitative results on two standard benchmarks, MSCOCO and Flickr30K show the effectiveness and practicability of the proposed A-VCTM framework.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11063-019-09978-8</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0002-4884-323X</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1370-4621
ispartof	Neural processing letters, 2019-10, Vol.50 (2), p.1891-1905
issn	1370-4621 1573-773X
language	eng
recordid	cdi_proquest_journals_2918343206
source	Springer Nature
subjects	Artificial Intelligence Bridge construction Complex Systems Computational Intelligence Computer Science Context Probability distribution Recurrent neural networks Representations Semantics
title	Deep Captioning with Attention-Based Visual Concept Transfer Mechanism for Enriching Description
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T02%3A45%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20Captioning%20with%20Attention-Based%20Visual%20Concept%20Transfer%20Mechanism%20for%20Enriching%20Description&rft.jtitle=Neural%20processing%20letters&rft.au=Zhang,%20Junxuan&rft.date=2019-10-01&rft.volume=50&rft.issue=2&rft.spage=1891&rft.epage=1905&rft.pages=1891-1905&rft.issn=1370-4621&rft.eissn=1573-773X&rft_id=info:doi/10.1007/s11063-019-09978-8&rft_dat=%3Cproquest_cross%3E2918343206%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c319t-9a1d4d0b0ec02444b86faa1e2f8834f322441ad2bbc0d128d82a28c78a2dba1c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2918343206&rft_id=info:pmid/&rfr_iscdi=true