Loading…

Blockwise Temporal-Spatial Pathway Network

Algorithms for video action recognition should consider not only spatial information but also temporal relations, which remains challenging. We propose a 3D-CNN-based action recognition model, called the blockwise temporal-spatial path-way network (BTSNet), which can adjust the temporal and spatial...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2022-08
Main Authors:	Hong, SeulGi, Min-Kook Choi
Format:	Article
Language:	English
Subjects:	Activity recognition Algorithms Object recognition Spatial data Visualization
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Hong, SeulGi Min-Kook Choi
description	Algorithms for video action recognition should consider not only spatial information but also temporal relations, which remains challenging. We propose a 3D-CNN-based action recognition model, called the blockwise temporal-spatial path-way network (BTSNet), which can adjust the temporal and spatial receptive fields by multiple pathways. We designed a novel model inspired by an adaptive kernel selection-based model, which is an architecture for effective feature encoding that adaptively chooses spatial receptive fields for image recognition. Expanding this approach to the temporal domain, our model extracts temporal and channel-wise attention and fuses information on various candidate operations. For evaluation, we tested our proposed model on UCF-101, HMDB-51, SVW, and Epic-Kitchen datasets and showed that it generalized well without pretraining. BTSNet also provides interpretable visualization based on spatiotemporal channel-wise attention. We confirm that the blockwise temporal-spatial pathway supports a better representation for 3D convolutional blocks based on this visualization.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2699825832</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2699825832</sourcerecordid><originalsourceid>FETCH-proquest_journals_26998258323</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mTQcsrJT84uzyxOVQhJzS3IL0rM0Q0uSCzJTMxRCEgsyShPrFTwSy0pzy_K5mFgTUvMKU7lhdLcDMpuriHOHroFRfmFpanFJfFZ-aVFeUCpeCMzS0sLI1MLYyNj4lQBACwsMSU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2699825832</pqid></control><display><type>article</type><title>Blockwise Temporal-Spatial Pathway Network</title><source>Publicly Available Content Database</source><creator>Hong, SeulGi ; Min-Kook Choi</creator><creatorcontrib>Hong, SeulGi ; Min-Kook Choi</creatorcontrib><description>Algorithms for video action recognition should consider not only spatial information but also temporal relations, which remains challenging. We propose a 3D-CNN-based action recognition model, called the blockwise temporal-spatial path-way network (BTSNet), which can adjust the temporal and spatial receptive fields by multiple pathways. We designed a novel model inspired by an adaptive kernel selection-based model, which is an architecture for effective feature encoding that adaptively chooses spatial receptive fields for image recognition. Expanding this approach to the temporal domain, our model extracts temporal and channel-wise attention and fuses information on various candidate operations. For evaluation, we tested our proposed model on UCF-101, HMDB-51, SVW, and Epic-Kitchen datasets and showed that it generalized well without pretraining. BTSNet also provides interpretable visualization based on spatiotemporal channel-wise attention. We confirm that the blockwise temporal-spatial pathway supports a better representation for 3D convolutional blocks based on this visualization.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Activity recognition ; Algorithms ; Object recognition ; Spatial data ; Visualization</subject><ispartof>arXiv.org, 2022-08</ispartof><rights>2022. This work is published under http://creativecommons.org/licenses/by-nc-sa/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2699825832?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>776,780,25733,36991,44569</link.rule.ids></links><search><creatorcontrib>Hong, SeulGi</creatorcontrib><creatorcontrib>Min-Kook Choi</creatorcontrib><title>Blockwise Temporal-Spatial Pathway Network</title><title>arXiv.org</title><description>Algorithms for video action recognition should consider not only spatial information but also temporal relations, which remains challenging. We propose a 3D-CNN-based action recognition model, called the blockwise temporal-spatial path-way network (BTSNet), which can adjust the temporal and spatial receptive fields by multiple pathways. We designed a novel model inspired by an adaptive kernel selection-based model, which is an architecture for effective feature encoding that adaptively chooses spatial receptive fields for image recognition. Expanding this approach to the temporal domain, our model extracts temporal and channel-wise attention and fuses information on various candidate operations. For evaluation, we tested our proposed model on UCF-101, HMDB-51, SVW, and Epic-Kitchen datasets and showed that it generalized well without pretraining. BTSNet also provides interpretable visualization based on spatiotemporal channel-wise attention. We confirm that the blockwise temporal-spatial pathway supports a better representation for 3D convolutional blocks based on this visualization.</description><subject>Activity recognition</subject><subject>Algorithms</subject><subject>Object recognition</subject><subject>Spatial data</subject><subject>Visualization</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mTQcsrJT84uzyxOVQhJzS3IL0rM0Q0uSCzJTMxRCEgsyShPrFTwSy0pzy_K5mFgTUvMKU7lhdLcDMpuriHOHroFRfmFpanFJfFZ-aVFeUCpeCMzS0sLI1MLYyNj4lQBACwsMSU</recordid><startdate>20220805</startdate><enddate>20220805</enddate><creator>Hong, SeulGi</creator><creator>Min-Kook Choi</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20220805</creationdate><title>Blockwise Temporal-Spatial Pathway Network</title><author>Hong, SeulGi ; Min-Kook Choi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_26998258323</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Activity recognition</topic><topic>Algorithms</topic><topic>Object recognition</topic><topic>Spatial data</topic><topic>Visualization</topic><toplevel>online_resources</toplevel><creatorcontrib>Hong, SeulGi</creatorcontrib><creatorcontrib>Min-Kook Choi</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hong, SeulGi</au><au>Min-Kook Choi</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Blockwise Temporal-Spatial Pathway Network</atitle><jtitle>arXiv.org</jtitle><date>2022-08-05</date><risdate>2022</risdate><eissn>2331-8422</eissn><abstract>Algorithms for video action recognition should consider not only spatial information but also temporal relations, which remains challenging. We propose a 3D-CNN-based action recognition model, called the blockwise temporal-spatial path-way network (BTSNet), which can adjust the temporal and spatial receptive fields by multiple pathways. We designed a novel model inspired by an adaptive kernel selection-based model, which is an architecture for effective feature encoding that adaptively chooses spatial receptive fields for image recognition. Expanding this approach to the temporal domain, our model extracts temporal and channel-wise attention and fuses information on various candidate operations. For evaluation, we tested our proposed model on UCF-101, HMDB-51, SVW, and Epic-Kitchen datasets and showed that it generalized well without pretraining. BTSNet also provides interpretable visualization based on spatiotemporal channel-wise attention. We confirm that the blockwise temporal-spatial pathway supports a better representation for 3D convolutional blocks based on this visualization.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2022-08
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2699825832
source	Publicly Available Content Database
subjects	Activity recognition Algorithms Object recognition Spatial data Visualization
title	Blockwise Temporal-Spatial Pathway Network
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T18%3A39%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Blockwise%20Temporal-Spatial%20Pathway%20Network&rft.jtitle=arXiv.org&rft.au=Hong,%20SeulGi&rft.date=2022-08-05&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2699825832%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_26998258323%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2699825832&rft_id=info:pmid/&rfr_iscdi=true