Loading…

A Novel Attribute Selection Mechanism for Video Captioning

Attributes are more and more popular for enhancing the performance of video captioning which requires semantic understanding of videos and the ability of generating natural language descriptions. However, existing methods have flaws in detecting visual attributes. As a result, the captioning model m...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiao, Huanhou, Shi, Jinglun
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 623
container_issue
container_start_page 619
container_title
container_volume
creator Xiao, Huanhou
Shi, Jinglun
description Attributes are more and more popular for enhancing the performance of video captioning which requires semantic understanding of videos and the ability of generating natural language descriptions. However, existing methods have flaws in detecting visual attributes. As a result, the captioning model may be misled by those erroneous attributes. Besides, each semantic attribute plays a different role in the next-word generation. How to utilize them effectively in video captioning task is still a challenge. To tackle these problems, in this paper, we propose a novel framework which imposes an attention mechanism guided by the visual attention on the detected video attributes to make a soft-selection over them. Simultaneously, the reinforcement learning algorithm is employed with the motivation to better select the useful attributes. Experimental results on benchmark datasets demonstrate that the proposed attribute selection mechanism can focus on appropriate attributes and boost the caption models.
doi_str_mv 10.1109/ICIP.2019.8803785
format conference_proceeding
fullrecord <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_8803785</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8803785</ieee_id><sourcerecordid>8803785</sourcerecordid><originalsourceid>FETCH-LOGICAL-i203t-6f8a1b5bbd09c47b24691c4c8997254399bb2ef756e71cba311d3a6ade4d8c0d3</originalsourceid><addsrcrecordid>eNotj8lKxEAURUtBsG37A8RN_UBivZrLXQgOgXYAh21Tw4uWpJMmiYJ_r2Kv7uIcDlxCzoCVAMxdNHXzWHIGrrSWCWPVAVk5Y0EJqzWXTh-SBRcWCqukOyYn0_TB2K8vYEEuK3o_fGFHq3kec_ickT5hh3HOQ0_vML77Pk9b2g4jfc0JB1r73R_L_dspOWp9N-Fqv0vycn31XN8W64ebpq7WReZMzIVurYegQkjMRWkCl9pBlNE6Z7iSwrkQOLZGaTQQgxcASXjtE8pkI0tiSc7_uxkRN7sxb_34vdlfFT-eykc9</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A Novel Attribute Selection Mechanism for Video Captioning</title><source>IEEE Xplore All Conference Series</source><creator>Xiao, Huanhou ; Shi, Jinglun</creator><creatorcontrib>Xiao, Huanhou ; Shi, Jinglun</creatorcontrib><description>Attributes are more and more popular for enhancing the performance of video captioning which requires semantic understanding of videos and the ability of generating natural language descriptions. However, existing methods have flaws in detecting visual attributes. As a result, the captioning model may be misled by those erroneous attributes. Besides, each semantic attribute plays a different role in the next-word generation. How to utilize them effectively in video captioning task is still a challenge. To tackle these problems, in this paper, we propose a novel framework which imposes an attention mechanism guided by the visual attention on the detected video attributes to make a soft-selection over them. Simultaneously, the reinforcement learning algorithm is employed with the motivation to better select the useful attributes. Experimental results on benchmark datasets demonstrate that the proposed attribute selection mechanism can focus on appropriate attributes and boost the caption models.</description><identifier>EISSN: 2381-8549</identifier><identifier>EISBN: 9781538662496</identifier><identifier>EISBN: 1538662493</identifier><identifier>DOI: 10.1109/ICIP.2019.8803785</identifier><language>eng</language><publisher>IEEE</publisher><subject>Attention ; Attributes ; Decoding ; Detectors ; Feature extraction ; Reinforcement learning ; Semantics ; Task analysis ; Training ; Video captioning ; Visualization</subject><ispartof>2019 IEEE International Conference on Image Processing (ICIP), 2019, p.619-623</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8803785$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8803785$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Xiao, Huanhou</creatorcontrib><creatorcontrib>Shi, Jinglun</creatorcontrib><title>A Novel Attribute Selection Mechanism for Video Captioning</title><title>2019 IEEE International Conference on Image Processing (ICIP)</title><addtitle>ICIP</addtitle><description>Attributes are more and more popular for enhancing the performance of video captioning which requires semantic understanding of videos and the ability of generating natural language descriptions. However, existing methods have flaws in detecting visual attributes. As a result, the captioning model may be misled by those erroneous attributes. Besides, each semantic attribute plays a different role in the next-word generation. How to utilize them effectively in video captioning task is still a challenge. To tackle these problems, in this paper, we propose a novel framework which imposes an attention mechanism guided by the visual attention on the detected video attributes to make a soft-selection over them. Simultaneously, the reinforcement learning algorithm is employed with the motivation to better select the useful attributes. Experimental results on benchmark datasets demonstrate that the proposed attribute selection mechanism can focus on appropriate attributes and boost the caption models.</description><subject>Attention</subject><subject>Attributes</subject><subject>Decoding</subject><subject>Detectors</subject><subject>Feature extraction</subject><subject>Reinforcement learning</subject><subject>Semantics</subject><subject>Task analysis</subject><subject>Training</subject><subject>Video captioning</subject><subject>Visualization</subject><issn>2381-8549</issn><isbn>9781538662496</isbn><isbn>1538662493</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2019</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotj8lKxEAURUtBsG37A8RN_UBivZrLXQgOgXYAh21Tw4uWpJMmiYJ_r2Kv7uIcDlxCzoCVAMxdNHXzWHIGrrSWCWPVAVk5Y0EJqzWXTh-SBRcWCqukOyYn0_TB2K8vYEEuK3o_fGFHq3kec_ickT5hh3HOQ0_vML77Pk9b2g4jfc0JB1r73R_L_dspOWp9N-Fqv0vycn31XN8W64ebpq7WReZMzIVurYegQkjMRWkCl9pBlNE6Z7iSwrkQOLZGaTQQgxcASXjtE8pkI0tiSc7_uxkRN7sxb_34vdlfFT-eykc9</recordid><startdate>201909</startdate><enddate>201909</enddate><creator>Xiao, Huanhou</creator><creator>Shi, Jinglun</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201909</creationdate><title>A Novel Attribute Selection Mechanism for Video Captioning</title><author>Xiao, Huanhou ; Shi, Jinglun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i203t-6f8a1b5bbd09c47b24691c4c8997254399bb2ef756e71cba311d3a6ade4d8c0d3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Attention</topic><topic>Attributes</topic><topic>Decoding</topic><topic>Detectors</topic><topic>Feature extraction</topic><topic>Reinforcement learning</topic><topic>Semantics</topic><topic>Task analysis</topic><topic>Training</topic><topic>Video captioning</topic><topic>Visualization</topic><toplevel>online_resources</toplevel><creatorcontrib>Xiao, Huanhou</creatorcontrib><creatorcontrib>Shi, Jinglun</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xiao, Huanhou</au><au>Shi, Jinglun</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A Novel Attribute Selection Mechanism for Video Captioning</atitle><btitle>2019 IEEE International Conference on Image Processing (ICIP)</btitle><stitle>ICIP</stitle><date>2019-09</date><risdate>2019</risdate><spage>619</spage><epage>623</epage><pages>619-623</pages><eissn>2381-8549</eissn><eisbn>9781538662496</eisbn><eisbn>1538662493</eisbn><abstract>Attributes are more and more popular for enhancing the performance of video captioning which requires semantic understanding of videos and the ability of generating natural language descriptions. However, existing methods have flaws in detecting visual attributes. As a result, the captioning model may be misled by those erroneous attributes. Besides, each semantic attribute plays a different role in the next-word generation. How to utilize them effectively in video captioning task is still a challenge. To tackle these problems, in this paper, we propose a novel framework which imposes an attention mechanism guided by the visual attention on the detected video attributes to make a soft-selection over them. Simultaneously, the reinforcement learning algorithm is employed with the motivation to better select the useful attributes. Experimental results on benchmark datasets demonstrate that the proposed attribute selection mechanism can focus on appropriate attributes and boost the caption models.</abstract><pub>IEEE</pub><doi>10.1109/ICIP.2019.8803785</doi><tpages>5</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier EISSN: 2381-8549
ispartof 2019 IEEE International Conference on Image Processing (ICIP), 2019, p.619-623
issn 2381-8549
language eng
recordid cdi_ieee_primary_8803785
source IEEE Xplore All Conference Series
subjects Attention
Attributes
Decoding
Detectors
Feature extraction
Reinforcement learning
Semantics
Task analysis
Training
Video captioning
Visualization
title A Novel Attribute Selection Mechanism for Video Captioning
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T21%3A49%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%20Novel%20Attribute%20Selection%20Mechanism%20for%20Video%20Captioning&rft.btitle=2019%20IEEE%20International%20Conference%20on%20Image%20Processing%20(ICIP)&rft.au=Xiao,%20Huanhou&rft.date=2019-09&rft.spage=619&rft.epage=623&rft.pages=619-623&rft.eissn=2381-8549&rft_id=info:doi/10.1109/ICIP.2019.8803785&rft.eisbn=9781538662496&rft.eisbn_list=1538662493&rft_dat=%3Cieee_CHZPO%3E8803785%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i203t-6f8a1b5bbd09c47b24691c4c8997254399bb2ef756e71cba311d3a6ade4d8c0d3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=8803785&rfr_iscdi=true