Loading…

Adaptation-Oriented Feature Projection for One-Shot Action Recognition

One-shot action recognition aims at recognizing actions in unseen classes in cases where only one training video is provided. Compared with one-shot image recognition, one-shot learning on videos is more difficult due to the fact that the temporal dimension of video may lead to greater variation. To...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on multimedia 2020-12, Vol.22 (12), p.3166-3179
Main Authors:	Zou, Yixiong, Shi, Yemin, Shi, Daochen, Wang, Yaowei, Liang, Yongsheng, Tian, Yonghong
Format:	Article
Language:	English
Subjects:	Adaptation Adaptation models adaptation-oriented feature projection AOF Computational modeling Data integration Datasets fast adaptation Feature recognition Image recognition Learning Object recognition One-shot action recognition Projection Task analysis Training Training data
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c338t-6d1c1a677f79fd91e9b6474ceca28a91ce16effb92c8290ed17162e23c4724ff3
cites	cdi_FETCH-LOGICAL-c338t-6d1c1a677f79fd91e9b6474ceca28a91ce16effb92c8290ed17162e23c4724ff3
container_end_page	3179
container_issue	12
container_start_page	3166
container_title	IEEE transactions on multimedia
container_volume	22
creator	Zou, Yixiong Shi, Yemin Shi, Daochen Wang, Yaowei Liang, Yongsheng Tian, Yonghong
description	One-shot action recognition aims at recognizing actions in unseen classes in cases where only one training video is provided. Compared with one-shot image recognition, one-shot learning on videos is more difficult due to the fact that the temporal dimension of video may lead to greater variation. To handle this variation, it is important to conduct further adaptation in the one-shot training process, despite the scarcity of the training data. While meta-learning is an option for facilitating this adaptation, it cannot be directly applied for two reasons: first, deep networks for action recognition can make current meta-learning methods infeasible to run because of their high computational complexity; second, due to the greater variation in actions, the adapted performance may not be higher than the un-adapted one, making it difficult to train the model by means of meta-learning. To address these problems and facilitate the adaptation, we propose the Adaptation-Oriented Feature (AOF) projection for one-shot action recognition. We first pre-train the base network on seen classes. The output of the network is projected to the adaptation-oriented feature space by fusing the important feature dimensions that are sensitive to adaptation. Subsequently, a small dataset (a.k.a. task) is sampled from seen classes to simulate the unseen-class training and testing settings. The feature adaptation is performed on the training data of this task to integrate the distribution information of the adapted feature. In order to reduce over-fitting, the triplet loss is applied to handle temporal variation with fewer parameters during the adaptation. On the testing data of this task, the losses on both adapted and un-adapted features are calculated to train the projection matrix. This sampling-adaptation-training procedure is then repeated on seen classes until convergence. Extensive experimental results on two challenging one-shot action recognition datasets demonstrate that our proposed method outperforms state-of-the-art methods.
doi_str_mv	10.1109/TMM.2020.2972128
format	article
fullrecord	<record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_8985321</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8985321</ieee_id><sourcerecordid>2462225893</sourcerecordid><originalsourceid>FETCH-LOGICAL-c338t-6d1c1a677f79fd91e9b6474ceca28a91ce16effb92c8290ed17162e23c4724ff3</originalsourceid><addsrcrecordid>eNo9kEFLAzEQhYMoWKt3wcuC59TMZHezOZZiVWipaD2HNDvRLbqp2fTgv3eXFU_zmHlvHnyMXYOYAQh9t12vZyhQzFArBKxO2AR0DlwIpU57XaDgGkGcs4uu2wsBeSHUhC3ntT0km5rQ8k1sqE1UZ0uy6Rgpe45hT264ZT7EbNMSf_0IKZuPuxdy4b1tBn3Jzrz97Ojqb07Z2_J-u3jkq83D02K-4k7KKvGyBge2VMor7WsNpHdlrnJHzmJlNTiCkrzfaXQVakE1KCiRULpcYe69nLLb8e8hhu8jdcnswzG2faXBvETEotKyd4nR5WLoukjeHGLzZeOPAWEGWqanZQZa5o9WH7kZIw0R_dsrXRUSQf4CuAdliw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2462225893</pqid></control><display><type>article</type><title>Adaptation-Oriented Feature Projection for One-Shot Action Recognition</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Zou, Yixiong ; Shi, Yemin ; Shi, Daochen ; Wang, Yaowei ; Liang, Yongsheng ; Tian, Yonghong</creator><creatorcontrib>Zou, Yixiong ; Shi, Yemin ; Shi, Daochen ; Wang, Yaowei ; Liang, Yongsheng ; Tian, Yonghong</creatorcontrib><description>One-shot action recognition aims at recognizing actions in unseen classes in cases where only one training video is provided. Compared with one-shot image recognition, one-shot learning on videos is more difficult due to the fact that the temporal dimension of video may lead to greater variation. To handle this variation, it is important to conduct further adaptation in the one-shot training process, despite the scarcity of the training data. While meta-learning is an option for facilitating this adaptation, it cannot be directly applied for two reasons: first, deep networks for action recognition can make current meta-learning methods infeasible to run because of their high computational complexity; second, due to the greater variation in actions, the adapted performance may not be higher than the un-adapted one, making it difficult to train the model by means of meta-learning. To address these problems and facilitate the adaptation, we propose the Adaptation-Oriented Feature (AOF) projection for one-shot action recognition. We first pre-train the base network on seen classes. The output of the network is projected to the adaptation-oriented feature space by fusing the important feature dimensions that are sensitive to adaptation. Subsequently, a small dataset (a.k.a. task) is sampled from seen classes to simulate the unseen-class training and testing settings. The feature adaptation is performed on the training data of this task to integrate the distribution information of the adapted feature. In order to reduce over-fitting, the triplet loss is applied to handle temporal variation with fewer parameters during the adaptation. On the testing data of this task, the losses on both adapted and un-adapted features are calculated to train the projection matrix. This sampling-adaptation-training procedure is then repeated on seen classes until convergence. Extensive experimental results on two challenging one-shot action recognition datasets demonstrate that our proposed method outperforms state-of-the-art methods.</description><identifier>ISSN: 1520-9210</identifier><identifier>EISSN: 1941-0077</identifier><identifier>DOI: 10.1109/TMM.2020.2972128</identifier><identifier>CODEN: ITMUF8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Adaptation ; Adaptation models ; adaptation-oriented feature projection ; AOF ; Computational modeling ; Data integration ; Datasets ; fast adaptation ; Feature recognition ; Image recognition ; Learning ; Object recognition ; One-shot action recognition ; Projection ; Task analysis ; Training ; Training data</subject><ispartof>IEEE transactions on multimedia, 2020-12, Vol.22 (12), p.3166-3179</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c338t-6d1c1a677f79fd91e9b6474ceca28a91ce16effb92c8290ed17162e23c4724ff3</citedby><cites>FETCH-LOGICAL-c338t-6d1c1a677f79fd91e9b6474ceca28a91ce16effb92c8290ed17162e23c4724ff3</cites><orcidid>0000-0001-9024-7266 ; 0000-0003-2197-9038 ; 0000-0002-2978-5935 ; 0000-0002-2125-9041</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8985321$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27922,27923,54794</link.rule.ids></links><search><creatorcontrib>Zou, Yixiong</creatorcontrib><creatorcontrib>Shi, Yemin</creatorcontrib><creatorcontrib>Shi, Daochen</creatorcontrib><creatorcontrib>Wang, Yaowei</creatorcontrib><creatorcontrib>Liang, Yongsheng</creatorcontrib><creatorcontrib>Tian, Yonghong</creatorcontrib><title>Adaptation-Oriented Feature Projection for One-Shot Action Recognition</title><title>IEEE transactions on multimedia</title><addtitle>TMM</addtitle><description>One-shot action recognition aims at recognizing actions in unseen classes in cases where only one training video is provided. Compared with one-shot image recognition, one-shot learning on videos is more difficult due to the fact that the temporal dimension of video may lead to greater variation. To handle this variation, it is important to conduct further adaptation in the one-shot training process, despite the scarcity of the training data. While meta-learning is an option for facilitating this adaptation, it cannot be directly applied for two reasons: first, deep networks for action recognition can make current meta-learning methods infeasible to run because of their high computational complexity; second, due to the greater variation in actions, the adapted performance may not be higher than the un-adapted one, making it difficult to train the model by means of meta-learning. To address these problems and facilitate the adaptation, we propose the Adaptation-Oriented Feature (AOF) projection for one-shot action recognition. We first pre-train the base network on seen classes. The output of the network is projected to the adaptation-oriented feature space by fusing the important feature dimensions that are sensitive to adaptation. Subsequently, a small dataset (a.k.a. task) is sampled from seen classes to simulate the unseen-class training and testing settings. The feature adaptation is performed on the training data of this task to integrate the distribution information of the adapted feature. In order to reduce over-fitting, the triplet loss is applied to handle temporal variation with fewer parameters during the adaptation. On the testing data of this task, the losses on both adapted and un-adapted features are calculated to train the projection matrix. This sampling-adaptation-training procedure is then repeated on seen classes until convergence. Extensive experimental results on two challenging one-shot action recognition datasets demonstrate that our proposed method outperforms state-of-the-art methods.</description><subject>Adaptation</subject><subject>Adaptation models</subject><subject>adaptation-oriented feature projection</subject><subject>AOF</subject><subject>Computational modeling</subject><subject>Data integration</subject><subject>Datasets</subject><subject>fast adaptation</subject><subject>Feature recognition</subject><subject>Image recognition</subject><subject>Learning</subject><subject>Object recognition</subject><subject>One-shot action recognition</subject><subject>Projection</subject><subject>Task analysis</subject><subject>Training</subject><subject>Training data</subject><issn>1520-9210</issn><issn>1941-0077</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNo9kEFLAzEQhYMoWKt3wcuC59TMZHezOZZiVWipaD2HNDvRLbqp2fTgv3eXFU_zmHlvHnyMXYOYAQh9t12vZyhQzFArBKxO2AR0DlwIpU57XaDgGkGcs4uu2wsBeSHUhC3ntT0km5rQ8k1sqE1UZ0uy6Rgpe45hT264ZT7EbNMSf_0IKZuPuxdy4b1tBn3Jzrz97Ojqb07Z2_J-u3jkq83D02K-4k7KKvGyBge2VMor7WsNpHdlrnJHzmJlNTiCkrzfaXQVakE1KCiRULpcYe69nLLb8e8hhu8jdcnswzG2faXBvETEotKyd4nR5WLoukjeHGLzZeOPAWEGWqanZQZa5o9WH7kZIw0R_dsrXRUSQf4CuAdliw</recordid><startdate>20201201</startdate><enddate>20201201</enddate><creator>Zou, Yixiong</creator><creator>Shi, Yemin</creator><creator>Shi, Daochen</creator><creator>Wang, Yaowei</creator><creator>Liang, Yongsheng</creator><creator>Tian, Yonghong</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0001-9024-7266</orcidid><orcidid>https://orcid.org/0000-0003-2197-9038</orcidid><orcidid>https://orcid.org/0000-0002-2978-5935</orcidid><orcidid>https://orcid.org/0000-0002-2125-9041</orcidid></search><sort><creationdate>20201201</creationdate><title>Adaptation-Oriented Feature Projection for One-Shot Action Recognition</title><author>Zou, Yixiong ; Shi, Yemin ; Shi, Daochen ; Wang, Yaowei ; Liang, Yongsheng ; Tian, Yonghong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c338t-6d1c1a677f79fd91e9b6474ceca28a91ce16effb92c8290ed17162e23c4724ff3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Adaptation</topic><topic>Adaptation models</topic><topic>adaptation-oriented feature projection</topic><topic>AOF</topic><topic>Computational modeling</topic><topic>Data integration</topic><topic>Datasets</topic><topic>fast adaptation</topic><topic>Feature recognition</topic><topic>Image recognition</topic><topic>Learning</topic><topic>Object recognition</topic><topic>One-shot action recognition</topic><topic>Projection</topic><topic>Task analysis</topic><topic>Training</topic><topic>Training data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zou, Yixiong</creatorcontrib><creatorcontrib>Shi, Yemin</creatorcontrib><creatorcontrib>Shi, Daochen</creatorcontrib><creatorcontrib>Wang, Yaowei</creatorcontrib><creatorcontrib>Liang, Yongsheng</creatorcontrib><creatorcontrib>Tian, Yonghong</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) Online</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on multimedia</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zou, Yixiong</au><au>Shi, Yemin</au><au>Shi, Daochen</au><au>Wang, Yaowei</au><au>Liang, Yongsheng</au><au>Tian, Yonghong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Adaptation-Oriented Feature Projection for One-Shot Action Recognition</atitle><jtitle>IEEE transactions on multimedia</jtitle><stitle>TMM</stitle><date>2020-12-01</date><risdate>2020</risdate><volume>22</volume><issue>12</issue><spage>3166</spage><epage>3179</epage><pages>3166-3179</pages><issn>1520-9210</issn><eissn>1941-0077</eissn><coden>ITMUF8</coden><abstract>One-shot action recognition aims at recognizing actions in unseen classes in cases where only one training video is provided. Compared with one-shot image recognition, one-shot learning on videos is more difficult due to the fact that the temporal dimension of video may lead to greater variation. To handle this variation, it is important to conduct further adaptation in the one-shot training process, despite the scarcity of the training data. While meta-learning is an option for facilitating this adaptation, it cannot be directly applied for two reasons: first, deep networks for action recognition can make current meta-learning methods infeasible to run because of their high computational complexity; second, due to the greater variation in actions, the adapted performance may not be higher than the un-adapted one, making it difficult to train the model by means of meta-learning. To address these problems and facilitate the adaptation, we propose the Adaptation-Oriented Feature (AOF) projection for one-shot action recognition. We first pre-train the base network on seen classes. The output of the network is projected to the adaptation-oriented feature space by fusing the important feature dimensions that are sensitive to adaptation. Subsequently, a small dataset (a.k.a. task) is sampled from seen classes to simulate the unseen-class training and testing settings. The feature adaptation is performed on the training data of this task to integrate the distribution information of the adapted feature. In order to reduce over-fitting, the triplet loss is applied to handle temporal variation with fewer parameters during the adaptation. On the testing data of this task, the losses on both adapted and un-adapted features are calculated to train the projection matrix. This sampling-adaptation-training procedure is then repeated on seen classes until convergence. Extensive experimental results on two challenging one-shot action recognition datasets demonstrate that our proposed method outperforms state-of-the-art methods.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TMM.2020.2972128</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-9024-7266</orcidid><orcidid>https://orcid.org/0000-0003-2197-9038</orcidid><orcidid>https://orcid.org/0000-0002-2978-5935</orcidid><orcidid>https://orcid.org/0000-0002-2125-9041</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1520-9210
ispartof	IEEE transactions on multimedia, 2020-12, Vol.22 (12), p.3166-3179
issn	1520-9210 1941-0077
language	eng
recordid	cdi_ieee_primary_8985321
source	IEEE Electronic Library (IEL) Journals
subjects	Adaptation Adaptation models adaptation-oriented feature projection AOF Computational modeling Data integration Datasets fast adaptation Feature recognition Image recognition Learning Object recognition One-shot action recognition Projection Task analysis Training Training data
title	Adaptation-Oriented Feature Projection for One-Shot Action Recognition
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T15%3A10%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Adaptation-Oriented%20Feature%20Projection%20for%20One-Shot%20Action%20Recognition&rft.jtitle=IEEE%20transactions%20on%20multimedia&rft.au=Zou,%20Yixiong&rft.date=2020-12-01&rft.volume=22&rft.issue=12&rft.spage=3166&rft.epage=3179&rft.pages=3166-3179&rft.issn=1520-9210&rft.eissn=1941-0077&rft.coden=ITMUF8&rft_id=info:doi/10.1109/TMM.2020.2972128&rft_dat=%3Cproquest_ieee_%3E2462225893%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c338t-6d1c1a677f79fd91e9b6474ceca28a91ce16effb92c8290ed17162e23c4724ff3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2462225893&rft_id=info:pmid/&rft_ieee_id=8985321&rfr_iscdi=true