Loading…

Multimodal vision-based human action recognition using deep learning: a review

Vision-based Human Action Recognition (HAR) is a hot topic in computer vision. Recently, deep-based HAR has shown promising results. HAR using a single data modality is a common approach; however, the fusion of different data sources essentially conveys complementary information and improves the res...

Full description

Saved in:
Bibliographic Details
Published in:The Artificial intelligence review 2024-06, Vol.57 (7), p.178, Article 178
Main Authors: Shafizadegan, Fatemeh, Naghsh-Nilchi, Ahmad R., Shabaninia, Elham
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c244t-16ac511bbf19adc531c9f9a47de96bac7aa638f4d6172dfe0349b8ed47df21353
container_end_page
container_issue 7
container_start_page 178
container_title The Artificial intelligence review
container_volume 57
creator Shafizadegan, Fatemeh
Naghsh-Nilchi, Ahmad R.
Shabaninia, Elham
description Vision-based Human Action Recognition (HAR) is a hot topic in computer vision. Recently, deep-based HAR has shown promising results. HAR using a single data modality is a common approach; however, the fusion of different data sources essentially conveys complementary information and improves the results. This paper comprehensively reviews deep-based HAR methods using multiple visual data modalities. The main contribution of this paper is categorizing existing methods into four levels, which provides an in-depth and comparable analysis of approaches in various aspects. So, at the first level, proposed methods are categorized based on the employed modalities. At the second level, methods categorized in the first level are classified based on the employment of complete modalities or working with missing modalities at the test time. At the third level, complete and missing modality branches are categorized based on existing approaches. Finally, similar frameworks in the third category are grouped together. In addition, a comprehensive comparison is provided for publicly available benchmark datasets, which helps to compare and choose suitable datasets for a task or to develop new datasets. This paper also compares the performance of state-of-the-art methods on benchmark datasets. The review concludes by highlighting several future directions.
doi_str_mv 10.1007/s10462-024-10730-5
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3069655004</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3069655004</sourcerecordid><originalsourceid>FETCH-LOGICAL-c244t-16ac511bbf19adc531c9f9a47de96bac7aa638f4d6172dfe0349b8ed47df21353</originalsourceid><addsrcrecordid>eNp9kEtPwzAQhC0EEqXwBzhF4mxYv5KGG6p4SQUucLYcP4qr1Cl2UsS_x22Q4MRpZ0ffzkqD0DmBSwJQXSUCvKQYKMcEKgZYHKAJERXDVfYP_-hjdJLSCgAE5WyCnp-Gtvfrzqi22Prku4Ablawp3oe1CoXSfbaKaHW3DH6vh-TDsjDWborWqhjydl2ojGy9_TxFR061yZ79zCl6u7t9nT_gxcv94_xmgTXlvMekVFoQ0jSO1MpowYiuXa14ZWxdNkpXSpVs5rgpSUWNs8B43cysyYCjhAk2RRdj7iZ2H4NNvVx1Qwz5pWRQ1qUQADxTdKR07FKK1slN9GsVvyQBuetNjr3J3Jvc9yZ30Ww8ShkOSxt_o_-5-gbIqHEA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3069655004</pqid></control><display><type>article</type><title>Multimodal vision-based human action recognition using deep learning: a review</title><source>Library &amp; Information Science Abstracts (LISA)</source><source>Springer Nature</source><source>Springer Nature - SpringerLink Journals - Fully Open Access </source><creator>Shafizadegan, Fatemeh ; Naghsh-Nilchi, Ahmad R. ; Shabaninia, Elham</creator><creatorcontrib>Shafizadegan, Fatemeh ; Naghsh-Nilchi, Ahmad R. ; Shabaninia, Elham</creatorcontrib><description>Vision-based Human Action Recognition (HAR) is a hot topic in computer vision. Recently, deep-based HAR has shown promising results. HAR using a single data modality is a common approach; however, the fusion of different data sources essentially conveys complementary information and improves the results. This paper comprehensively reviews deep-based HAR methods using multiple visual data modalities. The main contribution of this paper is categorizing existing methods into four levels, which provides an in-depth and comparable analysis of approaches in various aspects. So, at the first level, proposed methods are categorized based on the employed modalities. At the second level, methods categorized in the first level are classified based on the employment of complete modalities or working with missing modalities at the test time. At the third level, complete and missing modality branches are categorized based on existing approaches. Finally, similar frameworks in the third category are grouped together. In addition, a comprehensive comparison is provided for publicly available benchmark datasets, which helps to compare and choose suitable datasets for a task or to develop new datasets. This paper also compares the performance of state-of-the-art methods on benchmark datasets. The review concludes by highlighting several future directions.</description><identifier>ISSN: 1573-7462</identifier><identifier>ISSN: 0269-2821</identifier><identifier>EISSN: 1573-7462</identifier><identifier>DOI: 10.1007/s10462-024-10730-5</identifier><language>eng</language><publisher>Dordrecht: Springer Netherlands</publisher><subject>Artificial Intelligence ; Benchmarks ; Computer Science ; Computer vision ; Datasets ; Deep learning ; Human activity recognition</subject><ispartof>The Artificial intelligence review, 2024-06, Vol.57 (7), p.178, Article 178</ispartof><rights>The Author(s) 2024</rights><rights>The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c244t-16ac511bbf19adc531c9f9a47de96bac7aa638f4d6172dfe0349b8ed47df21353</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925,34135</link.rule.ids></links><search><creatorcontrib>Shafizadegan, Fatemeh</creatorcontrib><creatorcontrib>Naghsh-Nilchi, Ahmad R.</creatorcontrib><creatorcontrib>Shabaninia, Elham</creatorcontrib><title>Multimodal vision-based human action recognition using deep learning: a review</title><title>The Artificial intelligence review</title><addtitle>Artif Intell Rev</addtitle><description>Vision-based Human Action Recognition (HAR) is a hot topic in computer vision. Recently, deep-based HAR has shown promising results. HAR using a single data modality is a common approach; however, the fusion of different data sources essentially conveys complementary information and improves the results. This paper comprehensively reviews deep-based HAR methods using multiple visual data modalities. The main contribution of this paper is categorizing existing methods into four levels, which provides an in-depth and comparable analysis of approaches in various aspects. So, at the first level, proposed methods are categorized based on the employed modalities. At the second level, methods categorized in the first level are classified based on the employment of complete modalities or working with missing modalities at the test time. At the third level, complete and missing modality branches are categorized based on existing approaches. Finally, similar frameworks in the third category are grouped together. In addition, a comprehensive comparison is provided for publicly available benchmark datasets, which helps to compare and choose suitable datasets for a task or to develop new datasets. This paper also compares the performance of state-of-the-art methods on benchmark datasets. The review concludes by highlighting several future directions.</description><subject>Artificial Intelligence</subject><subject>Benchmarks</subject><subject>Computer Science</subject><subject>Computer vision</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>Human activity recognition</subject><issn>1573-7462</issn><issn>0269-2821</issn><issn>1573-7462</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>F2A</sourceid><recordid>eNp9kEtPwzAQhC0EEqXwBzhF4mxYv5KGG6p4SQUucLYcP4qr1Cl2UsS_x22Q4MRpZ0ffzkqD0DmBSwJQXSUCvKQYKMcEKgZYHKAJERXDVfYP_-hjdJLSCgAE5WyCnp-Gtvfrzqi22Prku4Ablawp3oe1CoXSfbaKaHW3DH6vh-TDsjDWborWqhjydl2ojGy9_TxFR061yZ79zCl6u7t9nT_gxcv94_xmgTXlvMekVFoQ0jSO1MpowYiuXa14ZWxdNkpXSpVs5rgpSUWNs8B43cysyYCjhAk2RRdj7iZ2H4NNvVx1Qwz5pWRQ1qUQADxTdKR07FKK1slN9GsVvyQBuetNjr3J3Jvc9yZ30Ww8ShkOSxt_o_-5-gbIqHEA</recordid><startdate>20240619</startdate><enddate>20240619</enddate><creator>Shafizadegan, Fatemeh</creator><creator>Naghsh-Nilchi, Ahmad R.</creator><creator>Shabaninia, Elham</creator><general>Springer Netherlands</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>E3H</scope><scope>F2A</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20240619</creationdate><title>Multimodal vision-based human action recognition using deep learning: a review</title><author>Shafizadegan, Fatemeh ; Naghsh-Nilchi, Ahmad R. ; Shabaninia, Elham</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c244t-16ac511bbf19adc531c9f9a47de96bac7aa638f4d6172dfe0349b8ed47df21353</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial Intelligence</topic><topic>Benchmarks</topic><topic>Computer Science</topic><topic>Computer vision</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>Human activity recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shafizadegan, Fatemeh</creatorcontrib><creatorcontrib>Naghsh-Nilchi, Ahmad R.</creatorcontrib><creatorcontrib>Shabaninia, Elham</creatorcontrib><collection>SpringerOpen</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Library &amp; Information Sciences Abstracts (LISA)</collection><collection>Library &amp; Information Science Abstracts (LISA)</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>The Artificial intelligence review</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shafizadegan, Fatemeh</au><au>Naghsh-Nilchi, Ahmad R.</au><au>Shabaninia, Elham</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multimodal vision-based human action recognition using deep learning: a review</atitle><jtitle>The Artificial intelligence review</jtitle><stitle>Artif Intell Rev</stitle><date>2024-06-19</date><risdate>2024</risdate><volume>57</volume><issue>7</issue><spage>178</spage><pages>178-</pages><artnum>178</artnum><issn>1573-7462</issn><issn>0269-2821</issn><eissn>1573-7462</eissn><abstract>Vision-based Human Action Recognition (HAR) is a hot topic in computer vision. Recently, deep-based HAR has shown promising results. HAR using a single data modality is a common approach; however, the fusion of different data sources essentially conveys complementary information and improves the results. This paper comprehensively reviews deep-based HAR methods using multiple visual data modalities. The main contribution of this paper is categorizing existing methods into four levels, which provides an in-depth and comparable analysis of approaches in various aspects. So, at the first level, proposed methods are categorized based on the employed modalities. At the second level, methods categorized in the first level are classified based on the employment of complete modalities or working with missing modalities at the test time. At the third level, complete and missing modality branches are categorized based on existing approaches. Finally, similar frameworks in the third category are grouped together. In addition, a comprehensive comparison is provided for publicly available benchmark datasets, which helps to compare and choose suitable datasets for a task or to develop new datasets. This paper also compares the performance of state-of-the-art methods on benchmark datasets. The review concludes by highlighting several future directions.</abstract><cop>Dordrecht</cop><pub>Springer Netherlands</pub><doi>10.1007/s10462-024-10730-5</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1573-7462
ispartof The Artificial intelligence review, 2024-06, Vol.57 (7), p.178, Article 178
issn 1573-7462
0269-2821
1573-7462
language eng
recordid cdi_proquest_journals_3069655004
source Library & Information Science Abstracts (LISA); Springer Nature; Springer Nature - SpringerLink Journals - Fully Open Access
subjects Artificial Intelligence
Benchmarks
Computer Science
Computer vision
Datasets
Deep learning
Human activity recognition
title Multimodal vision-based human action recognition using deep learning: a review
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T03%3A36%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multimodal%20vision-based%20human%20action%20recognition%20using%20deep%20learning:%20a%20review&rft.jtitle=The%20Artificial%20intelligence%20review&rft.au=Shafizadegan,%20Fatemeh&rft.date=2024-06-19&rft.volume=57&rft.issue=7&rft.spage=178&rft.pages=178-&rft.artnum=178&rft.issn=1573-7462&rft.eissn=1573-7462&rft_id=info:doi/10.1007/s10462-024-10730-5&rft_dat=%3Cproquest_cross%3E3069655004%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c244t-16ac511bbf19adc531c9f9a47de96bac7aa638f4d6172dfe0349b8ed47df21353%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3069655004&rft_id=info:pmid/&rfr_iscdi=true