Loading…

stagNet: An Attentive Semantic RNN for Group Activity and Individual Action Recognition

In real life, group activity recognition plays a significant and fundamental role in a variety of applications, e.g. sports video analysis, abnormal behavior detection, and intelligent surveillance. In a complex dynamic scene, a crucial yet challenging issue is how to better model the spatio-tempora...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on circuits and systems for video technology 2020-02, Vol.30 (2), p.549-565
Main Authors: Qi, Mengshi, Wang, Yunhong, Qin, Jie, Li, Annan, Luo, Jiebo, Van Gool, Luc
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c339t-1780051a77a935b145ee475b64b1d31fac439cbc03e319d42d3fba21de15a373
cites cdi_FETCH-LOGICAL-c339t-1780051a77a935b145ee475b64b1d31fac439cbc03e319d42d3fba21de15a373
container_end_page 565
container_issue 2
container_start_page 549
container_title IEEE transactions on circuits and systems for video technology
container_volume 30
creator Qi, Mengshi
Wang, Yunhong
Qin, Jie
Li, Annan
Luo, Jiebo
Van Gool, Luc
description In real life, group activity recognition plays a significant and fundamental role in a variety of applications, e.g. sports video analysis, abnormal behavior detection, and intelligent surveillance. In a complex dynamic scene, a crucial yet challenging issue is how to better model the spatio-temporal contextual information and inter-person relationship. In this paper, we present a novel attentive semantic recurrent neural network (RNN), namely, stagNet, for understanding group activities and individual actions in videos, by combining the spatio-temporal attention mechanism and semantic graph modeling. Specifically, a structured semantic graph is explicitly modeled to express the spatial contextual content of the whole scene, which is further incorporated with the temporal factor through structural-RNN. By virtue of the "factor sharing" and "message passing" mechanisms, our stagNet is capable of extracting discriminative and informative spatio-temporal representations and capturing inter-person relationships. Moreover, we adopt a spatio-temporal attention model to focus on key persons/frames for improved recognition performance. Besides, a body-region attention and a global-part feature pooling strategy are devised for individual action recognition. In experiments, four widely-used public datasets are adopted for performance evaluation, and the extensive results demonstrate the superiority and effectiveness of our method.
doi_str_mv 10.1109/TCSVT.2019.2894161
format article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_8621027</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8621027</ieee_id><sourcerecordid>2352190053</sourcerecordid><originalsourceid>FETCH-LOGICAL-c339t-1780051a77a935b145ee475b64b1d31fac439cbc03e319d42d3fba21de15a373</originalsourceid><addsrcrecordid>eNo9UMtOwzAQtBBIlMIPwMUS5xSvH03MLaqgVKqK1EZwtBzHqVK1TnEcpP497kOcdnZ3ZnY1CD0CGQEQ-VJMVl_FiBKQI5pJDmO4QgMQIksoJeI6YiIgySiIW3TXdRtCgGc8HaDvLuj1woZXnDuch2BdaH4tXtmdjsjg5WKB69bjqW_7Pc5N3DbhgLWr8MxVsal6vT3NW4eX1rRr1xzxPbqp9bazD5c6RMX7WzH5SOaf09kknyeGMRkSSDMSP9NpqiUTJXBhLU9FOeYlVAxqbTiTpjSEWQay4rRidakpVBaEZikbouez7d63P73tgtq0vXfxoqJMUJDRnUUWPbOMb7vO21rtfbPT_qCAqGN-6pSfOuanLvlF0dNZ1Fhr_wXZmAKhKfsDF9Jrfg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2352190053</pqid></control><display><type>article</type><title>stagNet: An Attentive Semantic RNN for Group Activity and Individual Action Recognition</title><source>IEEE Xplore (Online service)</source><creator>Qi, Mengshi ; Wang, Yunhong ; Qin, Jie ; Li, Annan ; Luo, Jiebo ; Van Gool, Luc</creator><creatorcontrib>Qi, Mengshi ; Wang, Yunhong ; Qin, Jie ; Li, Annan ; Luo, Jiebo ; Van Gool, Luc</creatorcontrib><description>In real life, group activity recognition plays a significant and fundamental role in a variety of applications, e.g. sports video analysis, abnormal behavior detection, and intelligent surveillance. In a complex dynamic scene, a crucial yet challenging issue is how to better model the spatio-temporal contextual information and inter-person relationship. In this paper, we present a novel attentive semantic recurrent neural network (RNN), namely, stagNet, for understanding group activities and individual actions in videos, by combining the spatio-temporal attention mechanism and semantic graph modeling. Specifically, a structured semantic graph is explicitly modeled to express the spatial contextual content of the whole scene, which is further incorporated with the temporal factor through structural-RNN. By virtue of the "factor sharing" and "message passing" mechanisms, our stagNet is capable of extracting discriminative and informative spatio-temporal representations and capturing inter-person relationships. Moreover, we adopt a spatio-temporal attention model to focus on key persons/frames for improved recognition performance. Besides, a body-region attention and a global-part feature pooling strategy are devised for individual action recognition. In experiments, four widely-used public datasets are adopted for performance evaluation, and the extensive results demonstrate the superiority and effectiveness of our method.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2019.2894161</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Action Recognition ; Activity recognition ; Adaptation models ; Group Activity Recognition ; Hidden Markov models ; Message passing ; Performance evaluation ; Recurrent neural networks ; RNN ; Scene Understanding ; Semantic Graph ; Semantics ; Spatio-temporal Attention ; Sports ; Task analysis</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2020-02, Vol.30 (2), p.549-565</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c339t-1780051a77a935b145ee475b64b1d31fac439cbc03e319d42d3fba21de15a373</citedby><cites>FETCH-LOGICAL-c339t-1780051a77a935b145ee475b64b1d31fac439cbc03e319d42d3fba21de15a373</cites><orcidid>0000-0002-0306-534X ; 0000-0001-8001-2703 ; 0000-0002-4516-9729 ; 0000-0002-6955-6635</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8621027$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Qi, Mengshi</creatorcontrib><creatorcontrib>Wang, Yunhong</creatorcontrib><creatorcontrib>Qin, Jie</creatorcontrib><creatorcontrib>Li, Annan</creatorcontrib><creatorcontrib>Luo, Jiebo</creatorcontrib><creatorcontrib>Van Gool, Luc</creatorcontrib><title>stagNet: An Attentive Semantic RNN for Group Activity and Individual Action Recognition</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>In real life, group activity recognition plays a significant and fundamental role in a variety of applications, e.g. sports video analysis, abnormal behavior detection, and intelligent surveillance. In a complex dynamic scene, a crucial yet challenging issue is how to better model the spatio-temporal contextual information and inter-person relationship. In this paper, we present a novel attentive semantic recurrent neural network (RNN), namely, stagNet, for understanding group activities and individual actions in videos, by combining the spatio-temporal attention mechanism and semantic graph modeling. Specifically, a structured semantic graph is explicitly modeled to express the spatial contextual content of the whole scene, which is further incorporated with the temporal factor through structural-RNN. By virtue of the "factor sharing" and "message passing" mechanisms, our stagNet is capable of extracting discriminative and informative spatio-temporal representations and capturing inter-person relationships. Moreover, we adopt a spatio-temporal attention model to focus on key persons/frames for improved recognition performance. Besides, a body-region attention and a global-part feature pooling strategy are devised for individual action recognition. In experiments, four widely-used public datasets are adopted for performance evaluation, and the extensive results demonstrate the superiority and effectiveness of our method.</description><subject>Action Recognition</subject><subject>Activity recognition</subject><subject>Adaptation models</subject><subject>Group Activity Recognition</subject><subject>Hidden Markov models</subject><subject>Message passing</subject><subject>Performance evaluation</subject><subject>Recurrent neural networks</subject><subject>RNN</subject><subject>Scene Understanding</subject><subject>Semantic Graph</subject><subject>Semantics</subject><subject>Spatio-temporal Attention</subject><subject>Sports</subject><subject>Task analysis</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNo9UMtOwzAQtBBIlMIPwMUS5xSvH03MLaqgVKqK1EZwtBzHqVK1TnEcpP497kOcdnZ3ZnY1CD0CGQEQ-VJMVl_FiBKQI5pJDmO4QgMQIksoJeI6YiIgySiIW3TXdRtCgGc8HaDvLuj1woZXnDuch2BdaH4tXtmdjsjg5WKB69bjqW_7Pc5N3DbhgLWr8MxVsal6vT3NW4eX1rRr1xzxPbqp9bazD5c6RMX7WzH5SOaf09kknyeGMRkSSDMSP9NpqiUTJXBhLU9FOeYlVAxqbTiTpjSEWQay4rRidakpVBaEZikbouez7d63P73tgtq0vXfxoqJMUJDRnUUWPbOMb7vO21rtfbPT_qCAqGN-6pSfOuanLvlF0dNZ1Fhr_wXZmAKhKfsDF9Jrfg</recordid><startdate>20200201</startdate><enddate>20200201</enddate><creator>Qi, Mengshi</creator><creator>Wang, Yunhong</creator><creator>Qin, Jie</creator><creator>Li, Annan</creator><creator>Luo, Jiebo</creator><creator>Van Gool, Luc</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-0306-534X</orcidid><orcidid>https://orcid.org/0000-0001-8001-2703</orcidid><orcidid>https://orcid.org/0000-0002-4516-9729</orcidid><orcidid>https://orcid.org/0000-0002-6955-6635</orcidid></search><sort><creationdate>20200201</creationdate><title>stagNet: An Attentive Semantic RNN for Group Activity and Individual Action Recognition</title><author>Qi, Mengshi ; Wang, Yunhong ; Qin, Jie ; Li, Annan ; Luo, Jiebo ; Van Gool, Luc</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c339t-1780051a77a935b145ee475b64b1d31fac439cbc03e319d42d3fba21de15a373</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Action Recognition</topic><topic>Activity recognition</topic><topic>Adaptation models</topic><topic>Group Activity Recognition</topic><topic>Hidden Markov models</topic><topic>Message passing</topic><topic>Performance evaluation</topic><topic>Recurrent neural networks</topic><topic>RNN</topic><topic>Scene Understanding</topic><topic>Semantic Graph</topic><topic>Semantics</topic><topic>Spatio-temporal Attention</topic><topic>Sports</topic><topic>Task analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Qi, Mengshi</creatorcontrib><creatorcontrib>Wang, Yunhong</creatorcontrib><creatorcontrib>Qin, Jie</creatorcontrib><creatorcontrib>Li, Annan</creatorcontrib><creatorcontrib>Luo, Jiebo</creatorcontrib><creatorcontrib>Van Gool, Luc</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore (IEEE/IET Electronic Library - IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Qi, Mengshi</au><au>Wang, Yunhong</au><au>Qin, Jie</au><au>Li, Annan</au><au>Luo, Jiebo</au><au>Van Gool, Luc</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>stagNet: An Attentive Semantic RNN for Group Activity and Individual Action Recognition</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2020-02-01</date><risdate>2020</risdate><volume>30</volume><issue>2</issue><spage>549</spage><epage>565</epage><pages>549-565</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>In real life, group activity recognition plays a significant and fundamental role in a variety of applications, e.g. sports video analysis, abnormal behavior detection, and intelligent surveillance. In a complex dynamic scene, a crucial yet challenging issue is how to better model the spatio-temporal contextual information and inter-person relationship. In this paper, we present a novel attentive semantic recurrent neural network (RNN), namely, stagNet, for understanding group activities and individual actions in videos, by combining the spatio-temporal attention mechanism and semantic graph modeling. Specifically, a structured semantic graph is explicitly modeled to express the spatial contextual content of the whole scene, which is further incorporated with the temporal factor through structural-RNN. By virtue of the "factor sharing" and "message passing" mechanisms, our stagNet is capable of extracting discriminative and informative spatio-temporal representations and capturing inter-person relationships. Moreover, we adopt a spatio-temporal attention model to focus on key persons/frames for improved recognition performance. Besides, a body-region attention and a global-part feature pooling strategy are devised for individual action recognition. In experiments, four widely-used public datasets are adopted for performance evaluation, and the extensive results demonstrate the superiority and effectiveness of our method.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCSVT.2019.2894161</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0002-0306-534X</orcidid><orcidid>https://orcid.org/0000-0001-8001-2703</orcidid><orcidid>https://orcid.org/0000-0002-4516-9729</orcidid><orcidid>https://orcid.org/0000-0002-6955-6635</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1051-8215
ispartof IEEE transactions on circuits and systems for video technology, 2020-02, Vol.30 (2), p.549-565
issn 1051-8215
1558-2205
language eng
recordid cdi_ieee_primary_8621027
source IEEE Xplore (Online service)
subjects Action Recognition
Activity recognition
Adaptation models
Group Activity Recognition
Hidden Markov models
Message passing
Performance evaluation
Recurrent neural networks
RNN
Scene Understanding
Semantic Graph
Semantics
Spatio-temporal Attention
Sports
Task analysis
title stagNet: An Attentive Semantic RNN for Group Activity and Individual Action Recognition
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T19%3A58%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=stagNet:%20An%20Attentive%20Semantic%20RNN%20for%20Group%20Activity%20and%20Individual%20Action%20Recognition&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Qi,%20Mengshi&rft.date=2020-02-01&rft.volume=30&rft.issue=2&rft.spage=549&rft.epage=565&rft.pages=549-565&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2019.2894161&rft_dat=%3Cproquest_ieee_%3E2352190053%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c339t-1780051a77a935b145ee475b64b1d31fac439cbc03e319d42d3fba21de15a373%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2352190053&rft_id=info:pmid/&rft_ieee_id=8621027&rfr_iscdi=true