Loading…
Deep Captioning with Attention-Based Visual Concept Transfer Mechanism for Enriching Description
In this paper, we propose a novel deep captioning framework called Attention-based multimodal recurrent neural network with Visual Concept Transfer Mechanism (A-VCTM). There are three advantages of the proposed A-VCTM. (1) A multimodal layer is used to integrate the visual representation and context...
Saved in:
Published in: | Neural processing letters 2019-10, Vol.50 (2), p.1891-1905 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c319t-9a1d4d0b0ec02444b86faa1e2f8834f322441ad2bbc0d128d82a28c78a2dba1c3 |
---|---|
cites | cdi_FETCH-LOGICAL-c319t-9a1d4d0b0ec02444b86faa1e2f8834f322441ad2bbc0d128d82a28c78a2dba1c3 |
container_end_page | 1905 |
container_issue | 2 |
container_start_page | 1891 |
container_title | Neural processing letters |
container_volume | 50 |
creator | Zhang, Junxuan Hu, Haifeng |
description | In this paper, we propose a novel deep captioning framework called Attention-based multimodal recurrent neural network with Visual Concept Transfer Mechanism (A-VCTM). There are three advantages of the proposed A-VCTM. (1) A multimodal layer is used to integrate the visual representation and context representation together, building a bridge that connects context information with visual information directly. (2) An attention mechanism is introduced to lead the model to focus on the regions corresponding to the next word to be generated (3) We propose a visual concept transfer mechanism to generate novel visual concepts and enrich the description sentences. Qualitative and quantitative results on two standard benchmarks, MSCOCO and Flickr30K show the effectiveness and practicability of the proposed A-VCTM framework. |
doi_str_mv | 10.1007/s11063-019-09978-8 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2918343206</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2918343206</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-9a1d4d0b0ec02444b86faa1e2f8834f322441ad2bbc0d128d82a28c78a2dba1c3</originalsourceid><addsrcrecordid>eNp9kE1LAzEQhoMoWKt_wFPAc3QmWbvZY23rB1S8VPEWs9lsm9Jm12SL-O9Nu4I3TzMM7_sMPIRcIlwjQH4TEWEkGGDBoChyyeQRGeBtLliei_fjtIscWDbieErOYlwDpBqHAfmYWtvSiW4713jnl_TLdSs67jrr9xd2p6Ot6JuLO72hk8Yb23Z0EbSPtQ302ZqV9i5uad0EOvPBmdUeMrXRBHdgnpOTWm-ivfidQ_J6P1tMHtn85eFpMp4zI7DoWKGxyioowRrgWZaVclRrjZbXUoqsFjwdUVe8LA1UyGUluebS5FLzqtRoxJBc9dw2NJ87Gzu1bnbBp5eKF5gYgidDQ8L7lAlNjMHWqg1uq8O3QlB7k6o3qZJJdTCpZCqJvhRT2C9t-EP_0_oB9jF3Zw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2918343206</pqid></control><display><type>article</type><title>Deep Captioning with Attention-Based Visual Concept Transfer Mechanism for Enriching Description</title><source>Springer Nature</source><creator>Zhang, Junxuan ; Hu, Haifeng</creator><creatorcontrib>Zhang, Junxuan ; Hu, Haifeng</creatorcontrib><description>In this paper, we propose a novel deep captioning framework called Attention-based multimodal recurrent neural network with Visual Concept Transfer Mechanism (A-VCTM). There are three advantages of the proposed A-VCTM. (1) A multimodal layer is used to integrate the visual representation and context representation together, building a bridge that connects context information with visual information directly. (2) An attention mechanism is introduced to lead the model to focus on the regions corresponding to the next word to be generated (3) We propose a visual concept transfer mechanism to generate novel visual concepts and enrich the description sentences. Qualitative and quantitative results on two standard benchmarks, MSCOCO and Flickr30K show the effectiveness and practicability of the proposed A-VCTM framework.</description><identifier>ISSN: 1370-4621</identifier><identifier>EISSN: 1573-773X</identifier><identifier>DOI: 10.1007/s11063-019-09978-8</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Artificial Intelligence ; Bridge construction ; Complex Systems ; Computational Intelligence ; Computer Science ; Context ; Probability distribution ; Recurrent neural networks ; Representations ; Semantics</subject><ispartof>Neural processing letters, 2019-10, Vol.50 (2), p.1891-1905</ispartof><rights>Springer Science+Business Media, LLC, part of Springer Nature 2019</rights><rights>Springer Science+Business Media, LLC, part of Springer Nature 2019.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-9a1d4d0b0ec02444b86faa1e2f8834f322441ad2bbc0d128d82a28c78a2dba1c3</citedby><cites>FETCH-LOGICAL-c319t-9a1d4d0b0ec02444b86faa1e2f8834f322441ad2bbc0d128d82a28c78a2dba1c3</cites><orcidid>0000-0002-4884-323X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Zhang, Junxuan</creatorcontrib><creatorcontrib>Hu, Haifeng</creatorcontrib><title>Deep Captioning with Attention-Based Visual Concept Transfer Mechanism for Enriching Description</title><title>Neural processing letters</title><addtitle>Neural Process Lett</addtitle><description>In this paper, we propose a novel deep captioning framework called Attention-based multimodal recurrent neural network with Visual Concept Transfer Mechanism (A-VCTM). There are three advantages of the proposed A-VCTM. (1) A multimodal layer is used to integrate the visual representation and context representation together, building a bridge that connects context information with visual information directly. (2) An attention mechanism is introduced to lead the model to focus on the regions corresponding to the next word to be generated (3) We propose a visual concept transfer mechanism to generate novel visual concepts and enrich the description sentences. Qualitative and quantitative results on two standard benchmarks, MSCOCO and Flickr30K show the effectiveness and practicability of the proposed A-VCTM framework.</description><subject>Artificial Intelligence</subject><subject>Bridge construction</subject><subject>Complex Systems</subject><subject>Computational Intelligence</subject><subject>Computer Science</subject><subject>Context</subject><subject>Probability distribution</subject><subject>Recurrent neural networks</subject><subject>Representations</subject><subject>Semantics</subject><issn>1370-4621</issn><issn>1573-773X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><recordid>eNp9kE1LAzEQhoMoWKt_wFPAc3QmWbvZY23rB1S8VPEWs9lsm9Jm12SL-O9Nu4I3TzMM7_sMPIRcIlwjQH4TEWEkGGDBoChyyeQRGeBtLliei_fjtIscWDbieErOYlwDpBqHAfmYWtvSiW4713jnl_TLdSs67jrr9xd2p6Ot6JuLO72hk8Yb23Z0EbSPtQ302ZqV9i5uad0EOvPBmdUeMrXRBHdgnpOTWm-ivfidQ_J6P1tMHtn85eFpMp4zI7DoWKGxyioowRrgWZaVclRrjZbXUoqsFjwdUVe8LA1UyGUluebS5FLzqtRoxJBc9dw2NJ87Gzu1bnbBp5eKF5gYgidDQ8L7lAlNjMHWqg1uq8O3QlB7k6o3qZJJdTCpZCqJvhRT2C9t-EP_0_oB9jF3Zw</recordid><startdate>20191001</startdate><enddate>20191001</enddate><creator>Zhang, Junxuan</creator><creator>Hu, Haifeng</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PSYQQ</scope><orcidid>https://orcid.org/0000-0002-4884-323X</orcidid></search><sort><creationdate>20191001</creationdate><title>Deep Captioning with Attention-Based Visual Concept Transfer Mechanism for Enriching Description</title><author>Zhang, Junxuan ; Hu, Haifeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-9a1d4d0b0ec02444b86faa1e2f8834f322441ad2bbc0d128d82a28c78a2dba1c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Artificial Intelligence</topic><topic>Bridge construction</topic><topic>Complex Systems</topic><topic>Computational Intelligence</topic><topic>Computer Science</topic><topic>Context</topic><topic>Probability distribution</topic><topic>Recurrent neural networks</topic><topic>Representations</topic><topic>Semantics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Junxuan</creatorcontrib><creatorcontrib>Hu, Haifeng</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest advanced technologies & aerospace journals</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest One Psychology</collection><jtitle>Neural processing letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Junxuan</au><au>Hu, Haifeng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep Captioning with Attention-Based Visual Concept Transfer Mechanism for Enriching Description</atitle><jtitle>Neural processing letters</jtitle><stitle>Neural Process Lett</stitle><date>2019-10-01</date><risdate>2019</risdate><volume>50</volume><issue>2</issue><spage>1891</spage><epage>1905</epage><pages>1891-1905</pages><issn>1370-4621</issn><eissn>1573-773X</eissn><abstract>In this paper, we propose a novel deep captioning framework called Attention-based multimodal recurrent neural network with Visual Concept Transfer Mechanism (A-VCTM). There are three advantages of the proposed A-VCTM. (1) A multimodal layer is used to integrate the visual representation and context representation together, building a bridge that connects context information with visual information directly. (2) An attention mechanism is introduced to lead the model to focus on the regions corresponding to the next word to be generated (3) We propose a visual concept transfer mechanism to generate novel visual concepts and enrich the description sentences. Qualitative and quantitative results on two standard benchmarks, MSCOCO and Flickr30K show the effectiveness and practicability of the proposed A-VCTM framework.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11063-019-09978-8</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0002-4884-323X</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1370-4621 |
ispartof | Neural processing letters, 2019-10, Vol.50 (2), p.1891-1905 |
issn | 1370-4621 1573-773X |
language | eng |
recordid | cdi_proquest_journals_2918343206 |
source | Springer Nature |
subjects | Artificial Intelligence Bridge construction Complex Systems Computational Intelligence Computer Science Context Probability distribution Recurrent neural networks Representations Semantics |
title | Deep Captioning with Attention-Based Visual Concept Transfer Mechanism for Enriching Description |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T02%3A45%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20Captioning%20with%20Attention-Based%20Visual%20Concept%20Transfer%20Mechanism%20for%20Enriching%20Description&rft.jtitle=Neural%20processing%20letters&rft.au=Zhang,%20Junxuan&rft.date=2019-10-01&rft.volume=50&rft.issue=2&rft.spage=1891&rft.epage=1905&rft.pages=1891-1905&rft.issn=1370-4621&rft.eissn=1573-773X&rft_id=info:doi/10.1007/s11063-019-09978-8&rft_dat=%3Cproquest_cross%3E2918343206%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c319t-9a1d4d0b0ec02444b86faa1e2f8834f322441ad2bbc0d128d82a28c78a2dba1c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2918343206&rft_id=info:pmid/&rfr_iscdi=true |