Loading…
stagNet: An Attentive Semantic RNN for Group Activity and Individual Action Recognition
In real life, group activity recognition plays a significant and fundamental role in a variety of applications, e.g. sports video analysis, abnormal behavior detection, and intelligent surveillance. In a complex dynamic scene, a crucial yet challenging issue is how to better model the spatio-tempora...
Saved in:
Published in: | IEEE transactions on circuits and systems for video technology 2020-02, Vol.30 (2), p.549-565 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c339t-1780051a77a935b145ee475b64b1d31fac439cbc03e319d42d3fba21de15a373 |
---|---|
cites | cdi_FETCH-LOGICAL-c339t-1780051a77a935b145ee475b64b1d31fac439cbc03e319d42d3fba21de15a373 |
container_end_page | 565 |
container_issue | 2 |
container_start_page | 549 |
container_title | IEEE transactions on circuits and systems for video technology |
container_volume | 30 |
creator | Qi, Mengshi Wang, Yunhong Qin, Jie Li, Annan Luo, Jiebo Van Gool, Luc |
description | In real life, group activity recognition plays a significant and fundamental role in a variety of applications, e.g. sports video analysis, abnormal behavior detection, and intelligent surveillance. In a complex dynamic scene, a crucial yet challenging issue is how to better model the spatio-temporal contextual information and inter-person relationship. In this paper, we present a novel attentive semantic recurrent neural network (RNN), namely, stagNet, for understanding group activities and individual actions in videos, by combining the spatio-temporal attention mechanism and semantic graph modeling. Specifically, a structured semantic graph is explicitly modeled to express the spatial contextual content of the whole scene, which is further incorporated with the temporal factor through structural-RNN. By virtue of the "factor sharing" and "message passing" mechanisms, our stagNet is capable of extracting discriminative and informative spatio-temporal representations and capturing inter-person relationships. Moreover, we adopt a spatio-temporal attention model to focus on key persons/frames for improved recognition performance. Besides, a body-region attention and a global-part feature pooling strategy are devised for individual action recognition. In experiments, four widely-used public datasets are adopted for performance evaluation, and the extensive results demonstrate the superiority and effectiveness of our method. |
doi_str_mv | 10.1109/TCSVT.2019.2894161 |
format | article |
fullrecord | <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_8621027</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8621027</ieee_id><sourcerecordid>2352190053</sourcerecordid><originalsourceid>FETCH-LOGICAL-c339t-1780051a77a935b145ee475b64b1d31fac439cbc03e319d42d3fba21de15a373</originalsourceid><addsrcrecordid>eNo9UMtOwzAQtBBIlMIPwMUS5xSvH03MLaqgVKqK1EZwtBzHqVK1TnEcpP497kOcdnZ3ZnY1CD0CGQEQ-VJMVl_FiBKQI5pJDmO4QgMQIksoJeI6YiIgySiIW3TXdRtCgGc8HaDvLuj1woZXnDuch2BdaH4tXtmdjsjg5WKB69bjqW_7Pc5N3DbhgLWr8MxVsal6vT3NW4eX1rRr1xzxPbqp9bazD5c6RMX7WzH5SOaf09kknyeGMRkSSDMSP9NpqiUTJXBhLU9FOeYlVAxqbTiTpjSEWQay4rRidakpVBaEZikbouez7d63P73tgtq0vXfxoqJMUJDRnUUWPbOMb7vO21rtfbPT_qCAqGN-6pSfOuanLvlF0dNZ1Fhr_wXZmAKhKfsDF9Jrfg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2352190053</pqid></control><display><type>article</type><title>stagNet: An Attentive Semantic RNN for Group Activity and Individual Action Recognition</title><source>IEEE Xplore (Online service)</source><creator>Qi, Mengshi ; Wang, Yunhong ; Qin, Jie ; Li, Annan ; Luo, Jiebo ; Van Gool, Luc</creator><creatorcontrib>Qi, Mengshi ; Wang, Yunhong ; Qin, Jie ; Li, Annan ; Luo, Jiebo ; Van Gool, Luc</creatorcontrib><description>In real life, group activity recognition plays a significant and fundamental role in a variety of applications, e.g. sports video analysis, abnormal behavior detection, and intelligent surveillance. In a complex dynamic scene, a crucial yet challenging issue is how to better model the spatio-temporal contextual information and inter-person relationship. In this paper, we present a novel attentive semantic recurrent neural network (RNN), namely, stagNet, for understanding group activities and individual actions in videos, by combining the spatio-temporal attention mechanism and semantic graph modeling. Specifically, a structured semantic graph is explicitly modeled to express the spatial contextual content of the whole scene, which is further incorporated with the temporal factor through structural-RNN. By virtue of the "factor sharing" and "message passing" mechanisms, our stagNet is capable of extracting discriminative and informative spatio-temporal representations and capturing inter-person relationships. Moreover, we adopt a spatio-temporal attention model to focus on key persons/frames for improved recognition performance. Besides, a body-region attention and a global-part feature pooling strategy are devised for individual action recognition. In experiments, four widely-used public datasets are adopted for performance evaluation, and the extensive results demonstrate the superiority and effectiveness of our method.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2019.2894161</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Action Recognition ; Activity recognition ; Adaptation models ; Group Activity Recognition ; Hidden Markov models ; Message passing ; Performance evaluation ; Recurrent neural networks ; RNN ; Scene Understanding ; Semantic Graph ; Semantics ; Spatio-temporal Attention ; Sports ; Task analysis</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2020-02, Vol.30 (2), p.549-565</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c339t-1780051a77a935b145ee475b64b1d31fac439cbc03e319d42d3fba21de15a373</citedby><cites>FETCH-LOGICAL-c339t-1780051a77a935b145ee475b64b1d31fac439cbc03e319d42d3fba21de15a373</cites><orcidid>0000-0002-0306-534X ; 0000-0001-8001-2703 ; 0000-0002-4516-9729 ; 0000-0002-6955-6635</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8621027$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Qi, Mengshi</creatorcontrib><creatorcontrib>Wang, Yunhong</creatorcontrib><creatorcontrib>Qin, Jie</creatorcontrib><creatorcontrib>Li, Annan</creatorcontrib><creatorcontrib>Luo, Jiebo</creatorcontrib><creatorcontrib>Van Gool, Luc</creatorcontrib><title>stagNet: An Attentive Semantic RNN for Group Activity and Individual Action Recognition</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>In real life, group activity recognition plays a significant and fundamental role in a variety of applications, e.g. sports video analysis, abnormal behavior detection, and intelligent surveillance. In a complex dynamic scene, a crucial yet challenging issue is how to better model the spatio-temporal contextual information and inter-person relationship. In this paper, we present a novel attentive semantic recurrent neural network (RNN), namely, stagNet, for understanding group activities and individual actions in videos, by combining the spatio-temporal attention mechanism and semantic graph modeling. Specifically, a structured semantic graph is explicitly modeled to express the spatial contextual content of the whole scene, which is further incorporated with the temporal factor through structural-RNN. By virtue of the "factor sharing" and "message passing" mechanisms, our stagNet is capable of extracting discriminative and informative spatio-temporal representations and capturing inter-person relationships. Moreover, we adopt a spatio-temporal attention model to focus on key persons/frames for improved recognition performance. Besides, a body-region attention and a global-part feature pooling strategy are devised for individual action recognition. In experiments, four widely-used public datasets are adopted for performance evaluation, and the extensive results demonstrate the superiority and effectiveness of our method.</description><subject>Action Recognition</subject><subject>Activity recognition</subject><subject>Adaptation models</subject><subject>Group Activity Recognition</subject><subject>Hidden Markov models</subject><subject>Message passing</subject><subject>Performance evaluation</subject><subject>Recurrent neural networks</subject><subject>RNN</subject><subject>Scene Understanding</subject><subject>Semantic Graph</subject><subject>Semantics</subject><subject>Spatio-temporal Attention</subject><subject>Sports</subject><subject>Task analysis</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNo9UMtOwzAQtBBIlMIPwMUS5xSvH03MLaqgVKqK1EZwtBzHqVK1TnEcpP497kOcdnZ3ZnY1CD0CGQEQ-VJMVl_FiBKQI5pJDmO4QgMQIksoJeI6YiIgySiIW3TXdRtCgGc8HaDvLuj1woZXnDuch2BdaH4tXtmdjsjg5WKB69bjqW_7Pc5N3DbhgLWr8MxVsal6vT3NW4eX1rRr1xzxPbqp9bazD5c6RMX7WzH5SOaf09kknyeGMRkSSDMSP9NpqiUTJXBhLU9FOeYlVAxqbTiTpjSEWQay4rRidakpVBaEZikbouez7d63P73tgtq0vXfxoqJMUJDRnUUWPbOMb7vO21rtfbPT_qCAqGN-6pSfOuanLvlF0dNZ1Fhr_wXZmAKhKfsDF9Jrfg</recordid><startdate>20200201</startdate><enddate>20200201</enddate><creator>Qi, Mengshi</creator><creator>Wang, Yunhong</creator><creator>Qin, Jie</creator><creator>Li, Annan</creator><creator>Luo, Jiebo</creator><creator>Van Gool, Luc</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-0306-534X</orcidid><orcidid>https://orcid.org/0000-0001-8001-2703</orcidid><orcidid>https://orcid.org/0000-0002-4516-9729</orcidid><orcidid>https://orcid.org/0000-0002-6955-6635</orcidid></search><sort><creationdate>20200201</creationdate><title>stagNet: An Attentive Semantic RNN for Group Activity and Individual Action Recognition</title><author>Qi, Mengshi ; Wang, Yunhong ; Qin, Jie ; Li, Annan ; Luo, Jiebo ; Van Gool, Luc</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c339t-1780051a77a935b145ee475b64b1d31fac439cbc03e319d42d3fba21de15a373</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Action Recognition</topic><topic>Activity recognition</topic><topic>Adaptation models</topic><topic>Group Activity Recognition</topic><topic>Hidden Markov models</topic><topic>Message passing</topic><topic>Performance evaluation</topic><topic>Recurrent neural networks</topic><topic>RNN</topic><topic>Scene Understanding</topic><topic>Semantic Graph</topic><topic>Semantics</topic><topic>Spatio-temporal Attention</topic><topic>Sports</topic><topic>Task analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Qi, Mengshi</creatorcontrib><creatorcontrib>Wang, Yunhong</creatorcontrib><creatorcontrib>Qin, Jie</creatorcontrib><creatorcontrib>Li, Annan</creatorcontrib><creatorcontrib>Luo, Jiebo</creatorcontrib><creatorcontrib>Van Gool, Luc</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore (IEEE/IET Electronic Library - IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Qi, Mengshi</au><au>Wang, Yunhong</au><au>Qin, Jie</au><au>Li, Annan</au><au>Luo, Jiebo</au><au>Van Gool, Luc</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>stagNet: An Attentive Semantic RNN for Group Activity and Individual Action Recognition</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2020-02-01</date><risdate>2020</risdate><volume>30</volume><issue>2</issue><spage>549</spage><epage>565</epage><pages>549-565</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>In real life, group activity recognition plays a significant and fundamental role in a variety of applications, e.g. sports video analysis, abnormal behavior detection, and intelligent surveillance. In a complex dynamic scene, a crucial yet challenging issue is how to better model the spatio-temporal contextual information and inter-person relationship. In this paper, we present a novel attentive semantic recurrent neural network (RNN), namely, stagNet, for understanding group activities and individual actions in videos, by combining the spatio-temporal attention mechanism and semantic graph modeling. Specifically, a structured semantic graph is explicitly modeled to express the spatial contextual content of the whole scene, which is further incorporated with the temporal factor through structural-RNN. By virtue of the "factor sharing" and "message passing" mechanisms, our stagNet is capable of extracting discriminative and informative spatio-temporal representations and capturing inter-person relationships. Moreover, we adopt a spatio-temporal attention model to focus on key persons/frames for improved recognition performance. Besides, a body-region attention and a global-part feature pooling strategy are devised for individual action recognition. In experiments, four widely-used public datasets are adopted for performance evaluation, and the extensive results demonstrate the superiority and effectiveness of our method.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCSVT.2019.2894161</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0002-0306-534X</orcidid><orcidid>https://orcid.org/0000-0001-8001-2703</orcidid><orcidid>https://orcid.org/0000-0002-4516-9729</orcidid><orcidid>https://orcid.org/0000-0002-6955-6635</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1051-8215 |
ispartof | IEEE transactions on circuits and systems for video technology, 2020-02, Vol.30 (2), p.549-565 |
issn | 1051-8215 1558-2205 |
language | eng |
recordid | cdi_ieee_primary_8621027 |
source | IEEE Xplore (Online service) |
subjects | Action Recognition Activity recognition Adaptation models Group Activity Recognition Hidden Markov models Message passing Performance evaluation Recurrent neural networks RNN Scene Understanding Semantic Graph Semantics Spatio-temporal Attention Sports Task analysis |
title | stagNet: An Attentive Semantic RNN for Group Activity and Individual Action Recognition |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T19%3A58%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=stagNet:%20An%20Attentive%20Semantic%20RNN%20for%20Group%20Activity%20and%20Individual%20Action%20Recognition&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Qi,%20Mengshi&rft.date=2020-02-01&rft.volume=30&rft.issue=2&rft.spage=549&rft.epage=565&rft.pages=549-565&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2019.2894161&rft_dat=%3Cproquest_ieee_%3E2352190053%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c339t-1780051a77a935b145ee475b64b1d31fac439cbc03e319d42d3fba21de15a373%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2352190053&rft_id=info:pmid/&rft_ieee_id=8621027&rfr_iscdi=true |