Loading…

Multibranch Attention Networks for Action Recognition in Still Images

Contextual information plays an important role in visual recognition. This is especially true for action recognition as contextual information, such as the objects a person interacts with and the scene in which the action is performed, is inseparable from a predefined action class. Meanwhile, the at...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on cognitive and developmental systems 2018-12, Vol.10 (4), p.1116-1125
Main Authors:	Shiyang Yan, Smith, Jeremy S., Wenjin Lu, Bailing Zhang
Format:	Article
Language:	English
Subjects:	Action recognition Biological system modeling Computer vision Context modeling contextual information Datasets Image recognition Instruments Moving object recognition multibranch CNN Object recognition Optimization soft attention mechanism Target recognition Training Vision systems Visualization
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c293t-d1824ec729378492fcc7b49a661ac8fe3bdf8f2d342aea8c858528cd279e5c0a3
cites	cdi_FETCH-LOGICAL-c293t-d1824ec729378492fcc7b49a661ac8fe3bdf8f2d342aea8c858528cd279e5c0a3
container_end_page	1125
container_issue	4
container_start_page	1116
container_title	IEEE transactions on cognitive and developmental systems
container_volume	10
creator	Shiyang Yan Smith, Jeremy S. Wenjin Lu Bailing Zhang
description	Contextual information plays an important role in visual recognition. This is especially true for action recognition as contextual information, such as the objects a person interacts with and the scene in which the action is performed, is inseparable from a predefined action class. Meanwhile, the attention mechanism of humans shows remarkable capability compared with the existing computer vision system in discovering contextual information. Inspired by this, we applied the soft attention mechanism by adding two extra branches in the original VGG16 model in which one is to apply scene-level attention whilst the other is region-level attention to capture the global and local contextual information. To make the multibranch model well converged and fully optimized, a two-step training method is proposed with an alternating optimization strategy. We call this model multibranch attention networks. To validate the effectiveness of the proposed approach on two experimental settings: with and without the bounding box of the target person, three publicly available datasets on human action were used for evaluation. This method achieved state-of-the-art results on the PASCAL VOC action dataset and the Stanford 40 dataset on both experimental settings and performed well on humans interacting with common objects dataset.
doi_str_mv	10.1109/TCDS.2017.2783944
format	article
fullrecord	<record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_8214269</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8214269</ieee_id><sourcerecordid>2299148141</sourcerecordid><originalsourceid>FETCH-LOGICAL-c293t-d1824ec729378492fcc7b49a661ac8fe3bdf8f2d342aea8c858528cd279e5c0a3</originalsourceid><addsrcrecordid>eNo9kE9LAzEQxYMoWGo_gHhZ8Lw1maSb5Fhq1UJVsPUc0my2pm43Nckifnu3f-hp3gzvzYMfQrcEDwnB8mE5eVwMARM-BC6oZOwC9YBymQtJ5eVZA75Ggxg3GGNSUC4Y76Hpa1sntwq6MV_ZOCXbJOeb7M2mXx--Y1b5kI3N4fZhjV837qBdky2Sq-tsttVrG2_QVaXraAen2UefT9Pl5CWfvz_PJuN5bkDSlJdEALOGd0vXLqEyhq-Y1EVBtBGVpauyEhWUlIG2WhgxEiMQpgQu7chgTfvo_vh3F_xPa2NSG9-GpqtUAFISJggjnYscXSb4GIOt1C64rQ5_imC1B6b2wNQemDoB6zJ3x4yz1p79AgiDQtJ_RAVmrw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2299148141</pqid></control><display><type>article</type><title>Multibranch Attention Networks for Action Recognition in Still Images</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Shiyang Yan ; Smith, Jeremy S. ; Wenjin Lu ; Bailing Zhang</creator><creatorcontrib>Shiyang Yan ; Smith, Jeremy S. ; Wenjin Lu ; Bailing Zhang</creatorcontrib><description>Contextual information plays an important role in visual recognition. This is especially true for action recognition as contextual information, such as the objects a person interacts with and the scene in which the action is performed, is inseparable from a predefined action class. Meanwhile, the attention mechanism of humans shows remarkable capability compared with the existing computer vision system in discovering contextual information. Inspired by this, we applied the soft attention mechanism by adding two extra branches in the original VGG16 model in which one is to apply scene-level attention whilst the other is region-level attention to capture the global and local contextual information. To make the multibranch model well converged and fully optimized, a two-step training method is proposed with an alternating optimization strategy. We call this model multibranch attention networks. To validate the effectiveness of the proposed approach on two experimental settings: with and without the bounding box of the target person, three publicly available datasets on human action were used for evaluation. This method achieved state-of-the-art results on the PASCAL VOC action dataset and the Stanford 40 dataset on both experimental settings and performed well on humans interacting with common objects dataset.</description><identifier>ISSN: 2379-8920</identifier><identifier>EISSN: 2379-8939</identifier><identifier>DOI: 10.1109/TCDS.2017.2783944</identifier><identifier>CODEN: ITCDA4</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Action recognition ; Biological system modeling ; Computer vision ; Context modeling ; contextual information ; Datasets ; Image recognition ; Instruments ; Moving object recognition ; multibranch CNN ; Object recognition ; Optimization ; soft attention mechanism ; Target recognition ; Training ; Vision systems ; Visualization</subject><ispartof>IEEE transactions on cognitive and developmental systems, 2018-12, Vol.10 (4), p.1116-1125</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c293t-d1824ec729378492fcc7b49a661ac8fe3bdf8f2d342aea8c858528cd279e5c0a3</citedby><cites>FETCH-LOGICAL-c293t-d1824ec729378492fcc7b49a661ac8fe3bdf8f2d342aea8c858528cd279e5c0a3</cites><orcidid>0000-0003-1411-5952</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8214269$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Shiyang Yan</creatorcontrib><creatorcontrib>Smith, Jeremy S.</creatorcontrib><creatorcontrib>Wenjin Lu</creatorcontrib><creatorcontrib>Bailing Zhang</creatorcontrib><title>Multibranch Attention Networks for Action Recognition in Still Images</title><title>IEEE transactions on cognitive and developmental systems</title><addtitle>TCDS</addtitle><description>Contextual information plays an important role in visual recognition. This is especially true for action recognition as contextual information, such as the objects a person interacts with and the scene in which the action is performed, is inseparable from a predefined action class. Meanwhile, the attention mechanism of humans shows remarkable capability compared with the existing computer vision system in discovering contextual information. Inspired by this, we applied the soft attention mechanism by adding two extra branches in the original VGG16 model in which one is to apply scene-level attention whilst the other is region-level attention to capture the global and local contextual information. To make the multibranch model well converged and fully optimized, a two-step training method is proposed with an alternating optimization strategy. We call this model multibranch attention networks. To validate the effectiveness of the proposed approach on two experimental settings: with and without the bounding box of the target person, three publicly available datasets on human action were used for evaluation. This method achieved state-of-the-art results on the PASCAL VOC action dataset and the Stanford 40 dataset on both experimental settings and performed well on humans interacting with common objects dataset.</description><subject>Action recognition</subject><subject>Biological system modeling</subject><subject>Computer vision</subject><subject>Context modeling</subject><subject>contextual information</subject><subject>Datasets</subject><subject>Image recognition</subject><subject>Instruments</subject><subject>Moving object recognition</subject><subject>multibranch CNN</subject><subject>Object recognition</subject><subject>Optimization</subject><subject>soft attention mechanism</subject><subject>Target recognition</subject><subject>Training</subject><subject>Vision systems</subject><subject>Visualization</subject><issn>2379-8920</issn><issn>2379-8939</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNo9kE9LAzEQxYMoWGo_gHhZ8Lw1maSb5Fhq1UJVsPUc0my2pm43Nckifnu3f-hp3gzvzYMfQrcEDwnB8mE5eVwMARM-BC6oZOwC9YBymQtJ5eVZA75Ggxg3GGNSUC4Y76Hpa1sntwq6MV_ZOCXbJOeb7M2mXx--Y1b5kI3N4fZhjV837qBdky2Sq-tsttVrG2_QVaXraAen2UefT9Pl5CWfvz_PJuN5bkDSlJdEALOGd0vXLqEyhq-Y1EVBtBGVpauyEhWUlIG2WhgxEiMQpgQu7chgTfvo_vh3F_xPa2NSG9-GpqtUAFISJggjnYscXSb4GIOt1C64rQ5_imC1B6b2wNQemDoB6zJ3x4yz1p79AgiDQtJ_RAVmrw</recordid><startdate>20181201</startdate><enddate>20181201</enddate><creator>Shiyang Yan</creator><creator>Smith, Jeremy S.</creator><creator>Wenjin Lu</creator><creator>Bailing Zhang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-1411-5952</orcidid></search><sort><creationdate>20181201</creationdate><title>Multibranch Attention Networks for Action Recognition in Still Images</title><author>Shiyang Yan ; Smith, Jeremy S. ; Wenjin Lu ; Bailing Zhang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c293t-d1824ec729378492fcc7b49a661ac8fe3bdf8f2d342aea8c858528cd279e5c0a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Action recognition</topic><topic>Biological system modeling</topic><topic>Computer vision</topic><topic>Context modeling</topic><topic>contextual information</topic><topic>Datasets</topic><topic>Image recognition</topic><topic>Instruments</topic><topic>Moving object recognition</topic><topic>multibranch CNN</topic><topic>Object recognition</topic><topic>Optimization</topic><topic>soft attention mechanism</topic><topic>Target recognition</topic><topic>Training</topic><topic>Vision systems</topic><topic>Visualization</topic><toplevel>online_resources</toplevel><creatorcontrib>Shiyang Yan</creatorcontrib><creatorcontrib>Smith, Jeremy S.</creatorcontrib><creatorcontrib>Wenjin Lu</creatorcontrib><creatorcontrib>Bailing Zhang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEL</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on cognitive and developmental systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shiyang Yan</au><au>Smith, Jeremy S.</au><au>Wenjin Lu</au><au>Bailing Zhang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multibranch Attention Networks for Action Recognition in Still Images</atitle><jtitle>IEEE transactions on cognitive and developmental systems</jtitle><stitle>TCDS</stitle><date>2018-12-01</date><risdate>2018</risdate><volume>10</volume><issue>4</issue><spage>1116</spage><epage>1125</epage><pages>1116-1125</pages><issn>2379-8920</issn><eissn>2379-8939</eissn><coden>ITCDA4</coden><abstract>Contextual information plays an important role in visual recognition. This is especially true for action recognition as contextual information, such as the objects a person interacts with and the scene in which the action is performed, is inseparable from a predefined action class. Meanwhile, the attention mechanism of humans shows remarkable capability compared with the existing computer vision system in discovering contextual information. Inspired by this, we applied the soft attention mechanism by adding two extra branches in the original VGG16 model in which one is to apply scene-level attention whilst the other is region-level attention to capture the global and local contextual information. To make the multibranch model well converged and fully optimized, a two-step training method is proposed with an alternating optimization strategy. We call this model multibranch attention networks. To validate the effectiveness of the proposed approach on two experimental settings: with and without the bounding box of the target person, three publicly available datasets on human action were used for evaluation. This method achieved state-of-the-art results on the PASCAL VOC action dataset and the Stanford 40 dataset on both experimental settings and performed well on humans interacting with common objects dataset.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TCDS.2017.2783944</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0003-1411-5952</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 2379-8920
ispartof	IEEE transactions on cognitive and developmental systems, 2018-12, Vol.10 (4), p.1116-1125
issn	2379-8920 2379-8939
language	eng
recordid	cdi_ieee_primary_8214269
source	IEEE Electronic Library (IEL) Journals
subjects	Action recognition Biological system modeling Computer vision Context modeling contextual information Datasets Image recognition Instruments Moving object recognition multibranch CNN Object recognition Optimization soft attention mechanism Target recognition Training Vision systems Visualization
title	Multibranch Attention Networks for Action Recognition in Still Images
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T10%3A26%3A03IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multibranch%20Attention%20Networks%20for%20Action%20Recognition%20in%20Still%20Images&rft.jtitle=IEEE%20transactions%20on%20cognitive%20and%20developmental%20systems&rft.au=Shiyang%20Yan&rft.date=2018-12-01&rft.volume=10&rft.issue=4&rft.spage=1116&rft.epage=1125&rft.pages=1116-1125&rft.issn=2379-8920&rft.eissn=2379-8939&rft.coden=ITCDA4&rft_id=info:doi/10.1109/TCDS.2017.2783944&rft_dat=%3Cproquest_ieee_%3E2299148141%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c293t-d1824ec729378492fcc7b49a661ac8fe3bdf8f2d342aea8c858528cd279e5c0a3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2299148141&rft_id=info:pmid/&rft_ieee_id=8214269&rfr_iscdi=true