Loading…
Egocentric activity recognition using two-stage decision fusion
The widespread adoption of wearable devices equipped with advanced sensor technologies has fueled the rapid growth of egocentric video capture, known as First Person Vision (FPV). Unlike traditional third-person videos, FPV exhibits distinct characteristics such as significant ego-motions and freque...
Saved in:
Published in: | Neural computing & applications 2024-12, Vol.36 (36), p.22889-22903 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The widespread adoption of wearable devices equipped with advanced sensor technologies has fueled the rapid growth of egocentric video capture, known as First Person Vision (FPV). Unlike traditional third-person videos, FPV exhibits distinct characteristics such as significant ego-motions and frequent scene changes, rendering conventional vision-based methods ineffective. This paper introduces a novel audio-visual decision fusion framework for egocentric activity recognition (EAR) that addresses these challenges. The proposed framework employs a two-stage decision fusion pipeline with explicit weight learning, integrating both audio and visual cues to enhance overall recognition performance. Additionally, a new publicly available dataset, the Egocentric Outdoor Activity Dataset, comprising 1392 video clips featuring 30 diverse outdoor activities, is also introduced to facilitate comparative evaluations of EAR algorithms and spur further research in the field. Experimental results demonstrate that the integration of audio and visual information significantly improves activity recognition performance, outperforming single modality approaches and equally weighted decisions from multiple modalities. |
---|---|
ISSN: | 0941-0643 1433-3058 |
DOI: | 10.1007/s00521-024-10463-0 |