Loading…
A Novel Attribute Selection Mechanism for Video Captioning
Attributes are more and more popular for enhancing the performance of video captioning which requires semantic understanding of videos and the ability of generating natural language descriptions. However, existing methods have flaws in detecting visual attributes. As a result, the captioning model m...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 623 |
container_issue | |
container_start_page | 619 |
container_title | |
container_volume | |
creator | Xiao, Huanhou Shi, Jinglun |
description | Attributes are more and more popular for enhancing the performance of video captioning which requires semantic understanding of videos and the ability of generating natural language descriptions. However, existing methods have flaws in detecting visual attributes. As a result, the captioning model may be misled by those erroneous attributes. Besides, each semantic attribute plays a different role in the next-word generation. How to utilize them effectively in video captioning task is still a challenge. To tackle these problems, in this paper, we propose a novel framework which imposes an attention mechanism guided by the visual attention on the detected video attributes to make a soft-selection over them. Simultaneously, the reinforcement learning algorithm is employed with the motivation to better select the useful attributes. Experimental results on benchmark datasets demonstrate that the proposed attribute selection mechanism can focus on appropriate attributes and boost the caption models. |
doi_str_mv | 10.1109/ICIP.2019.8803785 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_8803785</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8803785</ieee_id><sourcerecordid>8803785</sourcerecordid><originalsourceid>FETCH-LOGICAL-i203t-6f8a1b5bbd09c47b24691c4c8997254399bb2ef756e71cba311d3a6ade4d8c0d3</originalsourceid><addsrcrecordid>eNotj8lKxEAURUtBsG37A8RN_UBivZrLXQgOgXYAh21Tw4uWpJMmiYJ_r2Kv7uIcDlxCzoCVAMxdNHXzWHIGrrSWCWPVAVk5Y0EJqzWXTh-SBRcWCqukOyYn0_TB2K8vYEEuK3o_fGFHq3kec_ickT5hh3HOQ0_vML77Pk9b2g4jfc0JB1r73R_L_dspOWp9N-Fqv0vycn31XN8W64ebpq7WReZMzIVurYegQkjMRWkCl9pBlNE6Z7iSwrkQOLZGaTQQgxcASXjtE8pkI0tiSc7_uxkRN7sxb_34vdlfFT-eykc9</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A Novel Attribute Selection Mechanism for Video Captioning</title><source>IEEE Xplore All Conference Series</source><creator>Xiao, Huanhou ; Shi, Jinglun</creator><creatorcontrib>Xiao, Huanhou ; Shi, Jinglun</creatorcontrib><description>Attributes are more and more popular for enhancing the performance of video captioning which requires semantic understanding of videos and the ability of generating natural language descriptions. However, existing methods have flaws in detecting visual attributes. As a result, the captioning model may be misled by those erroneous attributes. Besides, each semantic attribute plays a different role in the next-word generation. How to utilize them effectively in video captioning task is still a challenge. To tackle these problems, in this paper, we propose a novel framework which imposes an attention mechanism guided by the visual attention on the detected video attributes to make a soft-selection over them. Simultaneously, the reinforcement learning algorithm is employed with the motivation to better select the useful attributes. Experimental results on benchmark datasets demonstrate that the proposed attribute selection mechanism can focus on appropriate attributes and boost the caption models.</description><identifier>EISSN: 2381-8549</identifier><identifier>EISBN: 9781538662496</identifier><identifier>EISBN: 1538662493</identifier><identifier>DOI: 10.1109/ICIP.2019.8803785</identifier><language>eng</language><publisher>IEEE</publisher><subject>Attention ; Attributes ; Decoding ; Detectors ; Feature extraction ; Reinforcement learning ; Semantics ; Task analysis ; Training ; Video captioning ; Visualization</subject><ispartof>2019 IEEE International Conference on Image Processing (ICIP), 2019, p.619-623</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8803785$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8803785$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Xiao, Huanhou</creatorcontrib><creatorcontrib>Shi, Jinglun</creatorcontrib><title>A Novel Attribute Selection Mechanism for Video Captioning</title><title>2019 IEEE International Conference on Image Processing (ICIP)</title><addtitle>ICIP</addtitle><description>Attributes are more and more popular for enhancing the performance of video captioning which requires semantic understanding of videos and the ability of generating natural language descriptions. However, existing methods have flaws in detecting visual attributes. As a result, the captioning model may be misled by those erroneous attributes. Besides, each semantic attribute plays a different role in the next-word generation. How to utilize them effectively in video captioning task is still a challenge. To tackle these problems, in this paper, we propose a novel framework which imposes an attention mechanism guided by the visual attention on the detected video attributes to make a soft-selection over them. Simultaneously, the reinforcement learning algorithm is employed with the motivation to better select the useful attributes. Experimental results on benchmark datasets demonstrate that the proposed attribute selection mechanism can focus on appropriate attributes and boost the caption models.</description><subject>Attention</subject><subject>Attributes</subject><subject>Decoding</subject><subject>Detectors</subject><subject>Feature extraction</subject><subject>Reinforcement learning</subject><subject>Semantics</subject><subject>Task analysis</subject><subject>Training</subject><subject>Video captioning</subject><subject>Visualization</subject><issn>2381-8549</issn><isbn>9781538662496</isbn><isbn>1538662493</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2019</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotj8lKxEAURUtBsG37A8RN_UBivZrLXQgOgXYAh21Tw4uWpJMmiYJ_r2Kv7uIcDlxCzoCVAMxdNHXzWHIGrrSWCWPVAVk5Y0EJqzWXTh-SBRcWCqukOyYn0_TB2K8vYEEuK3o_fGFHq3kec_ickT5hh3HOQ0_vML77Pk9b2g4jfc0JB1r73R_L_dspOWp9N-Fqv0vycn31XN8W64ebpq7WReZMzIVurYegQkjMRWkCl9pBlNE6Z7iSwrkQOLZGaTQQgxcASXjtE8pkI0tiSc7_uxkRN7sxb_34vdlfFT-eykc9</recordid><startdate>201909</startdate><enddate>201909</enddate><creator>Xiao, Huanhou</creator><creator>Shi, Jinglun</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201909</creationdate><title>A Novel Attribute Selection Mechanism for Video Captioning</title><author>Xiao, Huanhou ; Shi, Jinglun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i203t-6f8a1b5bbd09c47b24691c4c8997254399bb2ef756e71cba311d3a6ade4d8c0d3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Attention</topic><topic>Attributes</topic><topic>Decoding</topic><topic>Detectors</topic><topic>Feature extraction</topic><topic>Reinforcement learning</topic><topic>Semantics</topic><topic>Task analysis</topic><topic>Training</topic><topic>Video captioning</topic><topic>Visualization</topic><toplevel>online_resources</toplevel><creatorcontrib>Xiao, Huanhou</creatorcontrib><creatorcontrib>Shi, Jinglun</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xiao, Huanhou</au><au>Shi, Jinglun</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A Novel Attribute Selection Mechanism for Video Captioning</atitle><btitle>2019 IEEE International Conference on Image Processing (ICIP)</btitle><stitle>ICIP</stitle><date>2019-09</date><risdate>2019</risdate><spage>619</spage><epage>623</epage><pages>619-623</pages><eissn>2381-8549</eissn><eisbn>9781538662496</eisbn><eisbn>1538662493</eisbn><abstract>Attributes are more and more popular for enhancing the performance of video captioning which requires semantic understanding of videos and the ability of generating natural language descriptions. However, existing methods have flaws in detecting visual attributes. As a result, the captioning model may be misled by those erroneous attributes. Besides, each semantic attribute plays a different role in the next-word generation. How to utilize them effectively in video captioning task is still a challenge. To tackle these problems, in this paper, we propose a novel framework which imposes an attention mechanism guided by the visual attention on the detected video attributes to make a soft-selection over them. Simultaneously, the reinforcement learning algorithm is employed with the motivation to better select the useful attributes. Experimental results on benchmark datasets demonstrate that the proposed attribute selection mechanism can focus on appropriate attributes and boost the caption models.</abstract><pub>IEEE</pub><doi>10.1109/ICIP.2019.8803785</doi><tpages>5</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | EISSN: 2381-8549 |
ispartof | 2019 IEEE International Conference on Image Processing (ICIP), 2019, p.619-623 |
issn | 2381-8549 |
language | eng |
recordid | cdi_ieee_primary_8803785 |
source | IEEE Xplore All Conference Series |
subjects | Attention Attributes Decoding Detectors Feature extraction Reinforcement learning Semantics Task analysis Training Video captioning Visualization |
title | A Novel Attribute Selection Mechanism for Video Captioning |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T21%3A49%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%20Novel%20Attribute%20Selection%20Mechanism%20for%20Video%20Captioning&rft.btitle=2019%20IEEE%20International%20Conference%20on%20Image%20Processing%20(ICIP)&rft.au=Xiao,%20Huanhou&rft.date=2019-09&rft.spage=619&rft.epage=623&rft.pages=619-623&rft.eissn=2381-8549&rft_id=info:doi/10.1109/ICIP.2019.8803785&rft.eisbn=9781538662496&rft.eisbn_list=1538662493&rft_dat=%3Cieee_CHZPO%3E8803785%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i203t-6f8a1b5bbd09c47b24691c4c8997254399bb2ef756e71cba311d3a6ade4d8c0d3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=8803785&rfr_iscdi=true |