Loading…

Deep Captioning with Attention-Based Visual Concept Transfer Mechanism for Enriching Description

In this paper, we propose a novel deep captioning framework called Attention-based multimodal recurrent neural network with Visual Concept Transfer Mechanism (A-VCTM). There are three advantages of the proposed A-VCTM. (1) A multimodal layer is used to integrate the visual representation and context...

Full description

Saved in:
Bibliographic Details
Published in:Neural processing letters 2019-10, Vol.50 (2), p.1891-1905
Main Authors: Zhang, Junxuan, Hu, Haifeng
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c319t-9a1d4d0b0ec02444b86faa1e2f8834f322441ad2bbc0d128d82a28c78a2dba1c3
cites cdi_FETCH-LOGICAL-c319t-9a1d4d0b0ec02444b86faa1e2f8834f322441ad2bbc0d128d82a28c78a2dba1c3
container_end_page 1905
container_issue 2
container_start_page 1891
container_title Neural processing letters
container_volume 50
creator Zhang, Junxuan
Hu, Haifeng
description In this paper, we propose a novel deep captioning framework called Attention-based multimodal recurrent neural network with Visual Concept Transfer Mechanism (A-VCTM). There are three advantages of the proposed A-VCTM. (1) A multimodal layer is used to integrate the visual representation and context representation together, building a bridge that connects context information with visual information directly. (2) An attention mechanism is introduced to lead the model to focus on the regions corresponding to the next word to be generated (3) We propose a visual concept transfer mechanism to generate novel visual concepts and enrich the description sentences. Qualitative and quantitative results on two standard benchmarks, MSCOCO and Flickr30K show the effectiveness and practicability of the proposed A-VCTM framework.
doi_str_mv 10.1007/s11063-019-09978-8
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2918343206</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2918343206</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-9a1d4d0b0ec02444b86faa1e2f8834f322441ad2bbc0d128d82a28c78a2dba1c3</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWKt_wFPAc3QmWbvZY23rB1S8VPEWs9lsm9Jm12SL-O9Nu4I3TzMM7_sMPIRcIlwjQH4TEWEkGGDBoChyyeQRGeBtLliei_fjtIscWDbieErOYlwDpBqHAfmYWtvSiW4713jnl_TLdSs67jrr9xd2p6Ot6JuLO72hk8Yb23Z0EbSPtQ302ZqV9i5uad0EOvPBmdUeMrXRBHdgnpOTWm-ivfidQ_J6P1tMHtn85eFpMp4zI7DoWKGxyioowRrgWZaVclRrjZbXUoqsFjwdUVe8LA1UyGUluebS5FLzqtRoxJBc9dw2NJ87Gzu1bnbBp5eKF5gYgidDQ8L7lAlNjMHWqg1uq8O3QlB7k6o3qZJJdTCpZCqJvhRT2C9t-EP_0_oB9jF3Zw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918343206</pqid></control><display><type>article</type><title>Deep Captioning with Attention-Based Visual Concept Transfer Mechanism for Enriching Description</title><source>Springer Nature</source><creator>Zhang, Junxuan ; Hu, Haifeng</creator><creatorcontrib>Zhang, Junxuan ; Hu, Haifeng</creatorcontrib><description>In this paper, we propose a novel deep captioning framework called Attention-based multimodal recurrent neural network with Visual Concept Transfer Mechanism (A-VCTM). There are three advantages of the proposed A-VCTM. (1) A multimodal layer is used to integrate the visual representation and context representation together, building a bridge that connects context information with visual information directly. (2) An attention mechanism is introduced to lead the model to focus on the regions corresponding to the next word to be generated (3) We propose a visual concept transfer mechanism to generate novel visual concepts and enrich the description sentences. Qualitative and quantitative results on two standard benchmarks, MSCOCO and Flickr30K show the effectiveness and practicability of the proposed A-VCTM framework.</description><identifier>ISSN: 1370-4621</identifier><identifier>EISSN: 1573-773X</identifier><identifier>DOI: 10.1007/s11063-019-09978-8</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Artificial Intelligence ; Bridge construction ; Complex Systems ; Computational Intelligence ; Computer Science ; Context ; Probability distribution ; Recurrent neural networks ; Representations ; Semantics</subject><ispartof>Neural processing letters, 2019-10, Vol.50 (2), p.1891-1905</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2019</rights><rights>Springer Science+Business Media, LLC, part of Springer Nature 2019.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-9a1d4d0b0ec02444b86faa1e2f8834f322441ad2bbc0d128d82a28c78a2dba1c3</citedby><cites>FETCH-LOGICAL-c319t-9a1d4d0b0ec02444b86faa1e2f8834f322441ad2bbc0d128d82a28c78a2dba1c3</cites><orcidid>0000-0002-4884-323X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Zhang, Junxuan</creatorcontrib><creatorcontrib>Hu, Haifeng</creatorcontrib><title>Deep Captioning with Attention-Based Visual Concept Transfer Mechanism for Enriching Description</title><title>Neural processing letters</title><addtitle>Neural Process Lett</addtitle><description>In this paper, we propose a novel deep captioning framework called Attention-based multimodal recurrent neural network with Visual Concept Transfer Mechanism (A-VCTM). There are three advantages of the proposed A-VCTM. (1) A multimodal layer is used to integrate the visual representation and context representation together, building a bridge that connects context information with visual information directly. (2) An attention mechanism is introduced to lead the model to focus on the regions corresponding to the next word to be generated (3) We propose a visual concept transfer mechanism to generate novel visual concepts and enrich the description sentences. Qualitative and quantitative results on two standard benchmarks, MSCOCO and Flickr30K show the effectiveness and practicability of the proposed A-VCTM framework.</description><subject>Artificial Intelligence</subject><subject>Bridge construction</subject><subject>Complex Systems</subject><subject>Computational Intelligence</subject><subject>Computer Science</subject><subject>Context</subject><subject>Probability distribution</subject><subject>Recurrent neural networks</subject><subject>Representations</subject><subject>Semantics</subject><issn>1370-4621</issn><issn>1573-773X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LAzEQhoMoWKt_wFPAc3QmWbvZY23rB1S8VPEWs9lsm9Jm12SL-O9Nu4I3TzMM7_sMPIRcIlwjQH4TEWEkGGDBoChyyeQRGeBtLliei_fjtIscWDbieErOYlwDpBqHAfmYWtvSiW4713jnl_TLdSs67jrr9xd2p6Ot6JuLO72hk8Yb23Z0EbSPtQ302ZqV9i5uad0EOvPBmdUeMrXRBHdgnpOTWm-ivfidQ_J6P1tMHtn85eFpMp4zI7DoWKGxyioowRrgWZaVclRrjZbXUoqsFjwdUVe8LA1UyGUluebS5FLzqtRoxJBc9dw2NJ87Gzu1bnbBp5eKF5gYgidDQ8L7lAlNjMHWqg1uq8O3QlB7k6o3qZJJdTCpZCqJvhRT2C9t-EP_0_oB9jF3Zw</recordid><startdate>20191001</startdate><enddate>20191001</enddate><creator>Zhang, Junxuan</creator><creator>Hu, Haifeng</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PSYQQ</scope><orcidid>https://orcid.org/0000-0002-4884-323X</orcidid></search><sort><creationdate>20191001</creationdate><title>Deep Captioning with Attention-Based Visual Concept Transfer Mechanism for Enriching Description</title><author>Zhang, Junxuan ; Hu, Haifeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-9a1d4d0b0ec02444b86faa1e2f8834f322441ad2bbc0d128d82a28c78a2dba1c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Artificial Intelligence</topic><topic>Bridge construction</topic><topic>Complex Systems</topic><topic>Computational Intelligence</topic><topic>Computer Science</topic><topic>Context</topic><topic>Probability distribution</topic><topic>Recurrent neural networks</topic><topic>Representations</topic><topic>Semantics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Junxuan</creatorcontrib><creatorcontrib>Hu, Haifeng</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest advanced technologies &amp; aerospace journals</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest One Psychology</collection><jtitle>Neural processing letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Junxuan</au><au>Hu, Haifeng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep Captioning with Attention-Based Visual Concept Transfer Mechanism for Enriching Description</atitle><jtitle>Neural processing letters</jtitle><stitle>Neural Process Lett</stitle><date>2019-10-01</date><risdate>2019</risdate><volume>50</volume><issue>2</issue><spage>1891</spage><epage>1905</epage><pages>1891-1905</pages><issn>1370-4621</issn><eissn>1573-773X</eissn><abstract>In this paper, we propose a novel deep captioning framework called Attention-based multimodal recurrent neural network with Visual Concept Transfer Mechanism (A-VCTM). There are three advantages of the proposed A-VCTM. (1) A multimodal layer is used to integrate the visual representation and context representation together, building a bridge that connects context information with visual information directly. (2) An attention mechanism is introduced to lead the model to focus on the regions corresponding to the next word to be generated (3) We propose a visual concept transfer mechanism to generate novel visual concepts and enrich the description sentences. Qualitative and quantitative results on two standard benchmarks, MSCOCO and Flickr30K show the effectiveness and practicability of the proposed A-VCTM framework.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11063-019-09978-8</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0002-4884-323X</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1370-4621
ispartof Neural processing letters, 2019-10, Vol.50 (2), p.1891-1905
issn 1370-4621
1573-773X
language eng
recordid cdi_proquest_journals_2918343206
source Springer Nature
subjects Artificial Intelligence
Bridge construction
Complex Systems
Computational Intelligence
Computer Science
Context
Probability distribution
Recurrent neural networks
Representations
Semantics
title Deep Captioning with Attention-Based Visual Concept Transfer Mechanism for Enriching Description
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T02%3A45%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20Captioning%20with%20Attention-Based%20Visual%20Concept%20Transfer%20Mechanism%20for%20Enriching%20Description&rft.jtitle=Neural%20processing%20letters&rft.au=Zhang,%20Junxuan&rft.date=2019-10-01&rft.volume=50&rft.issue=2&rft.spage=1891&rft.epage=1905&rft.pages=1891-1905&rft.issn=1370-4621&rft.eissn=1573-773X&rft_id=info:doi/10.1007/s11063-019-09978-8&rft_dat=%3Cproquest_cross%3E2918343206%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c319t-9a1d4d0b0ec02444b86faa1e2f8834f322441ad2bbc0d128d82a28c78a2dba1c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2918343206&rft_id=info:pmid/&rfr_iscdi=true