Loading…

An efficient action proposal processing approach for temporal action detection

Temporal action detection is a fundamental yet challenging task in video understanding. It is important to process the action proposals for action classification and temporal boundary localization. Some methods process action proposals by exploiting the relations between them. However, learning the...

Full description

Saved in:

Bibliographic Details
Published in:	Neurocomputing (Amsterdam) 2025-03, Vol.623, p.129294, Article 129294
Main Authors:	Hu, Xuejiao, Dai, Jingzhao, Li, Ming, Li, Yang, Du, Sidan
Format:	Article
Language:	English
Subjects:	Action proposal processing Multi-layer perceptron Temporal action detection Video understanding
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites	cdi_FETCH-LOGICAL-c1004-4db1f6e3cd6a97acdd0ea05ad26ebc8339631dd3c2600484c65dccdc7082755c3
container_end_page
container_issue
container_start_page	129294
container_title	Neurocomputing (Amsterdam)
container_volume	623
creator	Hu, Xuejiao Dai, Jingzhao Li, Ming Li, Yang Du, Sidan
description	Temporal action detection is a fundamental yet challenging task in video understanding. It is important to process the action proposals for action classification and temporal boundary localization. Some methods process action proposals by exploiting the relations between them. However, learning the relations between numerous action proposals is time-consuming and requires huge computation and memory storage. Each proposal contains contextual information extracted from video segments, and redundant information aggregation has a negative impact on the final detection performance. In this paper, we exploit an efficient model which processes each proposal individually and learn intra-proposal features adequately, avoiding the interference of redundant information to achieve more effective detection. We also design relational learning models based on mean pooling, self-attention, and temporal convolution to compare with the intra-proposal learning model. Extensive experiments show that our method outperforms the relation learning models and achieves competitive performance on the two standard benchmarks. Moreover, efficiency experiments also verify that our model is more efficient than the relation learning methods.
doi_str_mv	10.1016/j.neucom.2024.129294
format	article
fullrecord	<record><control><sourceid>elsevier_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1016_j_neucom_2024_129294</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0925231224020654</els_id><sourcerecordid>S0925231224020654</sourcerecordid><originalsourceid>FETCH-LOGICAL-c1004-4db1f6e3cd6a97acdd0ea05ad26ebc8339631dd3c2600484c65dccdc7082755c3</originalsourceid><addsrcrecordid>eNp9UMtOwzAQ9AEkSuEPOOQHEvyKk1yQqoqXVMEFzpa7XoOjNo7sgMTf45CeOe2sdma0M4TcMFoxytRtXw34BeFYccplxXjHO3lGVrTjdckF4xfkMqWeUtbk24q8bIYCnfPgcZgKA5MPQzHGMIZkDjMATMkPH4UZ82Lgs3AhFhMexxAz4SSwOOEfuiLnzhwSXp_mmrw_3L9tn8rd6-PzdrMrgVEqS2n3zCkUYJXpGgPWUjS0NpYr3EMrRKcEs1YAV5neSlC1BbDQ0JY3dQ1iTeTiCzGkFNHpMfqjiT-aUT33oHu99KDnHvTSQ5bdLTLMv317jDrNwQGtjzmAtsH_b_ALTgxsRQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>An efficient action proposal processing approach for temporal action detection</title><source>ScienceDirect Freedom Collection</source><creator>Hu, Xuejiao ; Dai, Jingzhao ; Li, Ming ; Li, Yang ; Du, Sidan</creator><creatorcontrib>Hu, Xuejiao ; Dai, Jingzhao ; Li, Ming ; Li, Yang ; Du, Sidan</creatorcontrib><description>Temporal action detection is a fundamental yet challenging task in video understanding. It is important to process the action proposals for action classification and temporal boundary localization. Some methods process action proposals by exploiting the relations between them. However, learning the relations between numerous action proposals is time-consuming and requires huge computation and memory storage. Each proposal contains contextual information extracted from video segments, and redundant information aggregation has a negative impact on the final detection performance. In this paper, we exploit an efficient model which processes each proposal individually and learn intra-proposal features adequately, avoiding the interference of redundant information to achieve more effective detection. We also design relational learning models based on mean pooling, self-attention, and temporal convolution to compare with the intra-proposal learning model. Extensive experiments show that our method outperforms the relation learning models and achieves competitive performance on the two standard benchmarks. Moreover, efficiency experiments also verify that our model is more efficient than the relation learning methods.</description><identifier>ISSN: 0925-2312</identifier><identifier>DOI: 10.1016/j.neucom.2024.129294</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>Action proposal processing ; Multi-layer perceptron ; Temporal action detection ; Video understanding</subject><ispartof>Neurocomputing (Amsterdam), 2025-03, Vol.623, p.129294, Article 129294</ispartof><rights>2025</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c1004-4db1f6e3cd6a97acdd0ea05ad26ebc8339631dd3c2600484c65dccdc7082755c3</cites><orcidid>0000-0001-6432-3704</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Hu, Xuejiao</creatorcontrib><creatorcontrib>Dai, Jingzhao</creatorcontrib><creatorcontrib>Li, Ming</creatorcontrib><creatorcontrib>Li, Yang</creatorcontrib><creatorcontrib>Du, Sidan</creatorcontrib><title>An efficient action proposal processing approach for temporal action detection</title><title>Neurocomputing (Amsterdam)</title><description>Temporal action detection is a fundamental yet challenging task in video understanding. It is important to process the action proposals for action classification and temporal boundary localization. Some methods process action proposals by exploiting the relations between them. However, learning the relations between numerous action proposals is time-consuming and requires huge computation and memory storage. Each proposal contains contextual information extracted from video segments, and redundant information aggregation has a negative impact on the final detection performance. In this paper, we exploit an efficient model which processes each proposal individually and learn intra-proposal features adequately, avoiding the interference of redundant information to achieve more effective detection. We also design relational learning models based on mean pooling, self-attention, and temporal convolution to compare with the intra-proposal learning model. Extensive experiments show that our method outperforms the relation learning models and achieves competitive performance on the two standard benchmarks. Moreover, efficiency experiments also verify that our model is more efficient than the relation learning methods.</description><subject>Action proposal processing</subject><subject>Multi-layer perceptron</subject><subject>Temporal action detection</subject><subject>Video understanding</subject><issn>0925-2312</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2025</creationdate><recordtype>article</recordtype><recordid>eNp9UMtOwzAQ9AEkSuEPOOQHEvyKk1yQqoqXVMEFzpa7XoOjNo7sgMTf45CeOe2sdma0M4TcMFoxytRtXw34BeFYccplxXjHO3lGVrTjdckF4xfkMqWeUtbk24q8bIYCnfPgcZgKA5MPQzHGMIZkDjMATMkPH4UZ82Lgs3AhFhMexxAz4SSwOOEfuiLnzhwSXp_mmrw_3L9tn8rd6-PzdrMrgVEqS2n3zCkUYJXpGgPWUjS0NpYr3EMrRKcEs1YAV5neSlC1BbDQ0JY3dQ1iTeTiCzGkFNHpMfqjiT-aUT33oHu99KDnHvTSQ5bdLTLMv317jDrNwQGtjzmAtsH_b_ALTgxsRQ</recordid><startdate>20250328</startdate><enddate>20250328</enddate><creator>Hu, Xuejiao</creator><creator>Dai, Jingzhao</creator><creator>Li, Ming</creator><creator>Li, Yang</creator><creator>Du, Sidan</creator><general>Elsevier B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-6432-3704</orcidid></search><sort><creationdate>20250328</creationdate><title>An efficient action proposal processing approach for temporal action detection</title><author>Hu, Xuejiao ; Dai, Jingzhao ; Li, Ming ; Li, Yang ; Du, Sidan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c1004-4db1f6e3cd6a97acdd0ea05ad26ebc8339631dd3c2600484c65dccdc7082755c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2025</creationdate><topic>Action proposal processing</topic><topic>Multi-layer perceptron</topic><topic>Temporal action detection</topic><topic>Video understanding</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hu, Xuejiao</creatorcontrib><creatorcontrib>Dai, Jingzhao</creatorcontrib><creatorcontrib>Li, Ming</creatorcontrib><creatorcontrib>Li, Yang</creatorcontrib><creatorcontrib>Du, Sidan</creatorcontrib><collection>CrossRef</collection><jtitle>Neurocomputing (Amsterdam)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hu, Xuejiao</au><au>Dai, Jingzhao</au><au>Li, Ming</au><au>Li, Yang</au><au>Du, Sidan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An efficient action proposal processing approach for temporal action detection</atitle><jtitle>Neurocomputing (Amsterdam)</jtitle><date>2025-03-28</date><risdate>2025</risdate><volume>623</volume><spage>129294</spage><pages>129294-</pages><artnum>129294</artnum><issn>0925-2312</issn><abstract>Temporal action detection is a fundamental yet challenging task in video understanding. It is important to process the action proposals for action classification and temporal boundary localization. Some methods process action proposals by exploiting the relations between them. However, learning the relations between numerous action proposals is time-consuming and requires huge computation and memory storage. Each proposal contains contextual information extracted from video segments, and redundant information aggregation has a negative impact on the final detection performance. In this paper, we exploit an efficient model which processes each proposal individually and learn intra-proposal features adequately, avoiding the interference of redundant information to achieve more effective detection. We also design relational learning models based on mean pooling, self-attention, and temporal convolution to compare with the intra-proposal learning model. Extensive experiments show that our method outperforms the relation learning models and achieves competitive performance on the two standard benchmarks. Moreover, efficiency experiments also verify that our model is more efficient than the relation learning methods.</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.neucom.2024.129294</doi><orcidid>https://orcid.org/0000-0001-6432-3704</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0925-2312
ispartof	Neurocomputing (Amsterdam), 2025-03, Vol.623, p.129294, Article 129294
issn	0925-2312
language	eng
recordid	cdi_crossref_primary_10_1016_j_neucom_2024_129294
source	ScienceDirect Freedom Collection
subjects	Action proposal processing Multi-layer perceptron Temporal action detection Video understanding
title	An efficient action proposal processing approach for temporal action detection
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T00%3A52%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-elsevier_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20efficient%20action%20proposal%20processing%20approach%20for%20temporal%20action%20detection&rft.jtitle=Neurocomputing%20(Amsterdam)&rft.au=Hu,%20Xuejiao&rft.date=2025-03-28&rft.volume=623&rft.spage=129294&rft.pages=129294-&rft.artnum=129294&rft.issn=0925-2312&rft_id=info:doi/10.1016/j.neucom.2024.129294&rft_dat=%3Celsevier_cross%3ES0925231224020654%3C/elsevier_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c1004-4db1f6e3cd6a97acdd0ea05ad26ebc8339631dd3c2600484c65dccdc7082755c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true