Loading…

A parallel-fusion RNN-LSTM architecture for image caption generation

The models based on deep convolutional networks and recurrent neural networks have dominated in recent image caption generation tasks. Performance and complexity are still eternal topic. Inspired by recent work, by combining the advantages of simple RNN and LSTM, we present a novel parallel-fusion R...

Full description

Saved in:

Bibliographic Details
Main Authors:	Minsi Wang, Li Song, Xiaokang Yang, Chuanfei Luo
Format:	Conference Proceeding
Language:	English
Subjects:	Computational modeling Data models deep neural network Feature extraction Image captioning LSTM Measurement Recurrent neural networks RNN Training
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	4452
container_issue
container_start_page	4448
container_title
container_volume
creator	Minsi Wang Li Song Xiaokang Yang Chuanfei Luo
description	The models based on deep convolutional networks and recurrent neural networks have dominated in recent image caption generation tasks. Performance and complexity are still eternal topic. Inspired by recent work, by combining the advantages of simple RNN and LSTM, we present a novel parallel-fusion RNN-LSTM architecture, which obtains better results than a dominated one and improves the efficiency as well. The proposed approach divides the hidden units of RNN into several same-size parts, and lets them work in parallel. Then, we merge their outputs with corresponding ratios to generate final results. Moreover, these units can be different types of RNNs, for instance, a simple RNN and a LSTM. By training normally using NeuralTalk 1 platform on Flickr8k dataset, without additional training data, we get better results than that of dominated structure and particularly, the proposed model surpass GoogleNIC in image caption generation.
doi_str_mv	10.1109/ICIP.2016.7533201
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_7533201</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7533201</ieee_id><sourcerecordid>7533201</sourcerecordid><originalsourceid>FETCH-LOGICAL-i90t-2e93161c67c21562dc204e82b21f632afd60ebb6777a6d58544255450e38e1a83</originalsourceid><addsrcrecordid>eNotj8tOwzAURA0SEqX0AxAb_4CDrx2_llWAEim0FWRfOc5NCQpt5KQL_p5UdDVnMRqdIeQBeALA3VOe5dtEcNCJUVJOcEUWzlhItZHOadDXZCakBWZV6m7J3TB8cz7VJMzI85L2Pvquw441p6E9HujHes2Kz_Kd-hi-2hHDeIpIm2Ok7Y_fIw2-H8-9PR4w-jPek5vGdwMuLjkn5etLmb2xYrPKs2XBWsdHJtBJ0BC0CQKUFnUQPEUrKgGNlsI3teZYVdoY43WtJtlUKJUqjtIieCvn5PF_tkXEXR8nnfi7u3yWf0OnSS0</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A parallel-fusion RNN-LSTM architecture for image caption generation</title><source>IEEE Xplore All Conference Series</source><creator>Minsi Wang ; Li Song ; Xiaokang Yang ; Chuanfei Luo</creator><creatorcontrib>Minsi Wang ; Li Song ; Xiaokang Yang ; Chuanfei Luo</creatorcontrib><description>The models based on deep convolutional networks and recurrent neural networks have dominated in recent image caption generation tasks. Performance and complexity are still eternal topic. Inspired by recent work, by combining the advantages of simple RNN and LSTM, we present a novel parallel-fusion RNN-LSTM architecture, which obtains better results than a dominated one and improves the efficiency as well. The proposed approach divides the hidden units of RNN into several same-size parts, and lets them work in parallel. Then, we merge their outputs with corresponding ratios to generate final results. Moreover, these units can be different types of RNNs, for instance, a simple RNN and a LSTM. By training normally using NeuralTalk 1 platform on Flickr8k dataset, without additional training data, we get better results than that of dominated structure and particularly, the proposed model surpass GoogleNIC in image caption generation.</description><identifier>EISSN: 2381-8549</identifier><identifier>EISBN: 9781467399616</identifier><identifier>EISBN: 1467399612</identifier><identifier>DOI: 10.1109/ICIP.2016.7533201</identifier><language>eng</language><publisher>IEEE</publisher><subject>Computational modeling ; Data models ; deep neural network ; Feature extraction ; Image captioning ; LSTM ; Measurement ; Recurrent neural networks ; RNN ; Training</subject><ispartof>2016 IEEE International Conference on Image Processing (ICIP), 2016, p.4448-4452</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7533201$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7533201$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Minsi Wang</creatorcontrib><creatorcontrib>Li Song</creatorcontrib><creatorcontrib>Xiaokang Yang</creatorcontrib><creatorcontrib>Chuanfei Luo</creatorcontrib><title>A parallel-fusion RNN-LSTM architecture for image caption generation</title><title>2016 IEEE International Conference on Image Processing (ICIP)</title><addtitle>ICIP</addtitle><description>The models based on deep convolutional networks and recurrent neural networks have dominated in recent image caption generation tasks. Performance and complexity are still eternal topic. Inspired by recent work, by combining the advantages of simple RNN and LSTM, we present a novel parallel-fusion RNN-LSTM architecture, which obtains better results than a dominated one and improves the efficiency as well. The proposed approach divides the hidden units of RNN into several same-size parts, and lets them work in parallel. Then, we merge their outputs with corresponding ratios to generate final results. Moreover, these units can be different types of RNNs, for instance, a simple RNN and a LSTM. By training normally using NeuralTalk 1 platform on Flickr8k dataset, without additional training data, we get better results than that of dominated structure and particularly, the proposed model surpass GoogleNIC in image caption generation.</description><subject>Computational modeling</subject><subject>Data models</subject><subject>deep neural network</subject><subject>Feature extraction</subject><subject>Image captioning</subject><subject>LSTM</subject><subject>Measurement</subject><subject>Recurrent neural networks</subject><subject>RNN</subject><subject>Training</subject><issn>2381-8549</issn><isbn>9781467399616</isbn><isbn>1467399612</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2016</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotj8tOwzAURA0SEqX0AxAb_4CDrx2_llWAEim0FWRfOc5NCQpt5KQL_p5UdDVnMRqdIeQBeALA3VOe5dtEcNCJUVJOcEUWzlhItZHOadDXZCakBWZV6m7J3TB8cz7VJMzI85L2Pvquw441p6E9HujHes2Kz_Kd-hi-2hHDeIpIm2Ok7Y_fIw2-H8-9PR4w-jPek5vGdwMuLjkn5etLmb2xYrPKs2XBWsdHJtBJ0BC0CQKUFnUQPEUrKgGNlsI3teZYVdoY43WtJtlUKJUqjtIieCvn5PF_tkXEXR8nnfi7u3yWf0OnSS0</recordid><startdate>201609</startdate><enddate>201609</enddate><creator>Minsi Wang</creator><creator>Li Song</creator><creator>Xiaokang Yang</creator><creator>Chuanfei Luo</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201609</creationdate><title>A parallel-fusion RNN-LSTM architecture for image caption generation</title><author>Minsi Wang ; Li Song ; Xiaokang Yang ; Chuanfei Luo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i90t-2e93161c67c21562dc204e82b21f632afd60ebb6777a6d58544255450e38e1a83</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Computational modeling</topic><topic>Data models</topic><topic>deep neural network</topic><topic>Feature extraction</topic><topic>Image captioning</topic><topic>LSTM</topic><topic>Measurement</topic><topic>Recurrent neural networks</topic><topic>RNN</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Minsi Wang</creatorcontrib><creatorcontrib>Li Song</creatorcontrib><creatorcontrib>Xiaokang Yang</creatorcontrib><creatorcontrib>Chuanfei Luo</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Minsi Wang</au><au>Li Song</au><au>Xiaokang Yang</au><au>Chuanfei Luo</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A parallel-fusion RNN-LSTM architecture for image caption generation</atitle><btitle>2016 IEEE International Conference on Image Processing (ICIP)</btitle><stitle>ICIP</stitle><date>2016-09</date><risdate>2016</risdate><spage>4448</spage><epage>4452</epage><pages>4448-4452</pages><eissn>2381-8549</eissn><eisbn>9781467399616</eisbn><eisbn>1467399612</eisbn><abstract>The models based on deep convolutional networks and recurrent neural networks have dominated in recent image caption generation tasks. Performance and complexity are still eternal topic. Inspired by recent work, by combining the advantages of simple RNN and LSTM, we present a novel parallel-fusion RNN-LSTM architecture, which obtains better results than a dominated one and improves the efficiency as well. The proposed approach divides the hidden units of RNN into several same-size parts, and lets them work in parallel. Then, we merge their outputs with corresponding ratios to generate final results. Moreover, these units can be different types of RNNs, for instance, a simple RNN and a LSTM. By training normally using NeuralTalk 1 platform on Flickr8k dataset, without additional training data, we get better results than that of dominated structure and particularly, the proposed model surpass GoogleNIC in image caption generation.</abstract><pub>IEEE</pub><doi>10.1109/ICIP.2016.7533201</doi><tpages>5</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	EISSN: 2381-8549
ispartof	2016 IEEE International Conference on Image Processing (ICIP), 2016, p.4448-4452
issn	2381-8549
language	eng
recordid	cdi_ieee_primary_7533201
source	IEEE Xplore All Conference Series
subjects	Computational modeling Data models deep neural network Feature extraction Image captioning LSTM Measurement Recurrent neural networks RNN Training
title	A parallel-fusion RNN-LSTM architecture for image caption generation
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T14%3A31%3A53IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%20parallel-fusion%20RNN-LSTM%20architecture%20for%20image%20caption%20generation&rft.btitle=2016%20IEEE%20International%20Conference%20on%20Image%20Processing%20(ICIP)&rft.au=Minsi%20Wang&rft.date=2016-09&rft.spage=4448&rft.epage=4452&rft.pages=4448-4452&rft.eissn=2381-8549&rft_id=info:doi/10.1109/ICIP.2016.7533201&rft.eisbn=9781467399616&rft.eisbn_list=1467399612&rft_dat=%3Cieee_CHZPO%3E7533201%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i90t-2e93161c67c21562dc204e82b21f632afd60ebb6777a6d58544255450e38e1a83%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=7533201&rfr_iscdi=true