Loading…
Multimodal vision-based human action recognition using deep learning: a review
Vision-based Human Action Recognition (HAR) is a hot topic in computer vision. Recently, deep-based HAR has shown promising results. HAR using a single data modality is a common approach; however, the fusion of different data sources essentially conveys complementary information and improves the res...
Saved in:
Published in: | The Artificial intelligence review 2024-06, Vol.57 (7), p.178, Article 178 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c244t-16ac511bbf19adc531c9f9a47de96bac7aa638f4d6172dfe0349b8ed47df21353 |
container_end_page | |
container_issue | 7 |
container_start_page | 178 |
container_title | The Artificial intelligence review |
container_volume | 57 |
creator | Shafizadegan, Fatemeh Naghsh-Nilchi, Ahmad R. Shabaninia, Elham |
description | Vision-based Human Action Recognition (HAR) is a hot topic in computer vision. Recently, deep-based HAR has shown promising results. HAR using a single data modality is a common approach; however, the fusion of different data sources essentially conveys complementary information and improves the results. This paper comprehensively reviews deep-based HAR methods using multiple visual data modalities. The main contribution of this paper is categorizing existing methods into four levels, which provides an in-depth and comparable analysis of approaches in various aspects. So, at the first level, proposed methods are categorized based on the employed modalities. At the second level, methods categorized in the first level are classified based on the employment of complete modalities or working with missing modalities at the test time. At the third level, complete and missing modality branches are categorized based on existing approaches. Finally, similar frameworks in the third category are grouped together. In addition, a comprehensive comparison is provided for publicly available benchmark datasets, which helps to compare and choose suitable datasets for a task or to develop new datasets. This paper also compares the performance of state-of-the-art methods on benchmark datasets. The review concludes by highlighting several future directions. |
doi_str_mv | 10.1007/s10462-024-10730-5 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3069655004</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3069655004</sourcerecordid><originalsourceid>FETCH-LOGICAL-c244t-16ac511bbf19adc531c9f9a47de96bac7aa638f4d6172dfe0349b8ed47df21353</originalsourceid><addsrcrecordid>eNp9kEtPwzAQhC0EEqXwBzhF4mxYv5KGG6p4SQUucLYcP4qr1Cl2UsS_x22Q4MRpZ0ffzkqD0DmBSwJQXSUCvKQYKMcEKgZYHKAJERXDVfYP_-hjdJLSCgAE5WyCnp-Gtvfrzqi22Prku4Ablawp3oe1CoXSfbaKaHW3DH6vh-TDsjDWborWqhjydl2ojGy9_TxFR061yZ79zCl6u7t9nT_gxcv94_xmgTXlvMekVFoQ0jSO1MpowYiuXa14ZWxdNkpXSpVs5rgpSUWNs8B43cysyYCjhAk2RRdj7iZ2H4NNvVx1Qwz5pWRQ1qUQADxTdKR07FKK1slN9GsVvyQBuetNjr3J3Jvc9yZ30Ww8ShkOSxt_o_-5-gbIqHEA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3069655004</pqid></control><display><type>article</type><title>Multimodal vision-based human action recognition using deep learning: a review</title><source>Library & Information Science Abstracts (LISA)</source><source>Springer Nature</source><source>Springer Nature - SpringerLink Journals - Fully Open Access </source><creator>Shafizadegan, Fatemeh ; Naghsh-Nilchi, Ahmad R. ; Shabaninia, Elham</creator><creatorcontrib>Shafizadegan, Fatemeh ; Naghsh-Nilchi, Ahmad R. ; Shabaninia, Elham</creatorcontrib><description>Vision-based Human Action Recognition (HAR) is a hot topic in computer vision. Recently, deep-based HAR has shown promising results. HAR using a single data modality is a common approach; however, the fusion of different data sources essentially conveys complementary information and improves the results. This paper comprehensively reviews deep-based HAR methods using multiple visual data modalities. The main contribution of this paper is categorizing existing methods into four levels, which provides an in-depth and comparable analysis of approaches in various aspects. So, at the first level, proposed methods are categorized based on the employed modalities. At the second level, methods categorized in the first level are classified based on the employment of complete modalities or working with missing modalities at the test time. At the third level, complete and missing modality branches are categorized based on existing approaches. Finally, similar frameworks in the third category are grouped together. In addition, a comprehensive comparison is provided for publicly available benchmark datasets, which helps to compare and choose suitable datasets for a task or to develop new datasets. This paper also compares the performance of state-of-the-art methods on benchmark datasets. The review concludes by highlighting several future directions.</description><identifier>ISSN: 1573-7462</identifier><identifier>ISSN: 0269-2821</identifier><identifier>EISSN: 1573-7462</identifier><identifier>DOI: 10.1007/s10462-024-10730-5</identifier><language>eng</language><publisher>Dordrecht: Springer Netherlands</publisher><subject>Artificial Intelligence ; Benchmarks ; Computer Science ; Computer vision ; Datasets ; Deep learning ; Human activity recognition</subject><ispartof>The Artificial intelligence review, 2024-06, Vol.57 (7), p.178, Article 178</ispartof><rights>The Author(s) 2024</rights><rights>The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c244t-16ac511bbf19adc531c9f9a47de96bac7aa638f4d6172dfe0349b8ed47df21353</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925,34135</link.rule.ids></links><search><creatorcontrib>Shafizadegan, Fatemeh</creatorcontrib><creatorcontrib>Naghsh-Nilchi, Ahmad R.</creatorcontrib><creatorcontrib>Shabaninia, Elham</creatorcontrib><title>Multimodal vision-based human action recognition using deep learning: a review</title><title>The Artificial intelligence review</title><addtitle>Artif Intell Rev</addtitle><description>Vision-based Human Action Recognition (HAR) is a hot topic in computer vision. Recently, deep-based HAR has shown promising results. HAR using a single data modality is a common approach; however, the fusion of different data sources essentially conveys complementary information and improves the results. This paper comprehensively reviews deep-based HAR methods using multiple visual data modalities. The main contribution of this paper is categorizing existing methods into four levels, which provides an in-depth and comparable analysis of approaches in various aspects. So, at the first level, proposed methods are categorized based on the employed modalities. At the second level, methods categorized in the first level are classified based on the employment of complete modalities or working with missing modalities at the test time. At the third level, complete and missing modality branches are categorized based on existing approaches. Finally, similar frameworks in the third category are grouped together. In addition, a comprehensive comparison is provided for publicly available benchmark datasets, which helps to compare and choose suitable datasets for a task or to develop new datasets. This paper also compares the performance of state-of-the-art methods on benchmark datasets. The review concludes by highlighting several future directions.</description><subject>Artificial Intelligence</subject><subject>Benchmarks</subject><subject>Computer Science</subject><subject>Computer vision</subject><subject>Datasets</subject><subject>Deep learning</subject><subject>Human activity recognition</subject><issn>1573-7462</issn><issn>0269-2821</issn><issn>1573-7462</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>F2A</sourceid><recordid>eNp9kEtPwzAQhC0EEqXwBzhF4mxYv5KGG6p4SQUucLYcP4qr1Cl2UsS_x22Q4MRpZ0ffzkqD0DmBSwJQXSUCvKQYKMcEKgZYHKAJERXDVfYP_-hjdJLSCgAE5WyCnp-Gtvfrzqi22Prku4Ablawp3oe1CoXSfbaKaHW3DH6vh-TDsjDWborWqhjydl2ojGy9_TxFR061yZ79zCl6u7t9nT_gxcv94_xmgTXlvMekVFoQ0jSO1MpowYiuXa14ZWxdNkpXSpVs5rgpSUWNs8B43cysyYCjhAk2RRdj7iZ2H4NNvVx1Qwz5pWRQ1qUQADxTdKR07FKK1slN9GsVvyQBuetNjr3J3Jvc9yZ30Ww8ShkOSxt_o_-5-gbIqHEA</recordid><startdate>20240619</startdate><enddate>20240619</enddate><creator>Shafizadegan, Fatemeh</creator><creator>Naghsh-Nilchi, Ahmad R.</creator><creator>Shabaninia, Elham</creator><general>Springer Netherlands</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>E3H</scope><scope>F2A</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20240619</creationdate><title>Multimodal vision-based human action recognition using deep learning: a review</title><author>Shafizadegan, Fatemeh ; Naghsh-Nilchi, Ahmad R. ; Shabaninia, Elham</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c244t-16ac511bbf19adc531c9f9a47de96bac7aa638f4d6172dfe0349b8ed47df21353</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial Intelligence</topic><topic>Benchmarks</topic><topic>Computer Science</topic><topic>Computer vision</topic><topic>Datasets</topic><topic>Deep learning</topic><topic>Human activity recognition</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shafizadegan, Fatemeh</creatorcontrib><creatorcontrib>Naghsh-Nilchi, Ahmad R.</creatorcontrib><creatorcontrib>Shabaninia, Elham</creatorcontrib><collection>SpringerOpen</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>The Artificial intelligence review</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shafizadegan, Fatemeh</au><au>Naghsh-Nilchi, Ahmad R.</au><au>Shabaninia, Elham</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multimodal vision-based human action recognition using deep learning: a review</atitle><jtitle>The Artificial intelligence review</jtitle><stitle>Artif Intell Rev</stitle><date>2024-06-19</date><risdate>2024</risdate><volume>57</volume><issue>7</issue><spage>178</spage><pages>178-</pages><artnum>178</artnum><issn>1573-7462</issn><issn>0269-2821</issn><eissn>1573-7462</eissn><abstract>Vision-based Human Action Recognition (HAR) is a hot topic in computer vision. Recently, deep-based HAR has shown promising results. HAR using a single data modality is a common approach; however, the fusion of different data sources essentially conveys complementary information and improves the results. This paper comprehensively reviews deep-based HAR methods using multiple visual data modalities. The main contribution of this paper is categorizing existing methods into four levels, which provides an in-depth and comparable analysis of approaches in various aspects. So, at the first level, proposed methods are categorized based on the employed modalities. At the second level, methods categorized in the first level are classified based on the employment of complete modalities or working with missing modalities at the test time. At the third level, complete and missing modality branches are categorized based on existing approaches. Finally, similar frameworks in the third category are grouped together. In addition, a comprehensive comparison is provided for publicly available benchmark datasets, which helps to compare and choose suitable datasets for a task or to develop new datasets. This paper also compares the performance of state-of-the-art methods on benchmark datasets. The review concludes by highlighting several future directions.</abstract><cop>Dordrecht</cop><pub>Springer Netherlands</pub><doi>10.1007/s10462-024-10730-5</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1573-7462 |
ispartof | The Artificial intelligence review, 2024-06, Vol.57 (7), p.178, Article 178 |
issn | 1573-7462 0269-2821 1573-7462 |
language | eng |
recordid | cdi_proquest_journals_3069655004 |
source | Library & Information Science Abstracts (LISA); Springer Nature; Springer Nature - SpringerLink Journals - Fully Open Access |
subjects | Artificial Intelligence Benchmarks Computer Science Computer vision Datasets Deep learning Human activity recognition |
title | Multimodal vision-based human action recognition using deep learning: a review |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T03%3A36%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multimodal%20vision-based%20human%20action%20recognition%20using%20deep%20learning:%20a%20review&rft.jtitle=The%20Artificial%20intelligence%20review&rft.au=Shafizadegan,%20Fatemeh&rft.date=2024-06-19&rft.volume=57&rft.issue=7&rft.spage=178&rft.pages=178-&rft.artnum=178&rft.issn=1573-7462&rft.eissn=1573-7462&rft_id=info:doi/10.1007/s10462-024-10730-5&rft_dat=%3Cproquest_cross%3E3069655004%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c244t-16ac511bbf19adc531c9f9a47de96bac7aa638f4d6172dfe0349b8ed47df21353%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3069655004&rft_id=info:pmid/&rfr_iscdi=true |