Loading…

Egocentric activity recognition using two-stage decision fusion

The widespread adoption of wearable devices equipped with advanced sensor technologies has fueled the rapid growth of egocentric video capture, known as First Person Vision (FPV). Unlike traditional third-person videos, FPV exhibits distinct characteristics such as significant ego-motions and freque...

Full description

Saved in:
Bibliographic Details
Published in:Neural computing & applications 2024-12, Vol.36 (36), p.22889-22903
Main Authors: Arabacı, Mehmet Ali, Surer, Elif, Temizel, Alptekin
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The widespread adoption of wearable devices equipped with advanced sensor technologies has fueled the rapid growth of egocentric video capture, known as First Person Vision (FPV). Unlike traditional third-person videos, FPV exhibits distinct characteristics such as significant ego-motions and frequent scene changes, rendering conventional vision-based methods ineffective. This paper introduces a novel audio-visual decision fusion framework for egocentric activity recognition (EAR) that addresses these challenges. The proposed framework employs a two-stage decision fusion pipeline with explicit weight learning, integrating both audio and visual cues to enhance overall recognition performance. Additionally, a new publicly available dataset, the Egocentric Outdoor Activity Dataset, comprising 1392 video clips featuring 30 diverse outdoor activities, is also introduced to facilitate comparative evaluations of EAR algorithms and spur further research in the field. Experimental results demonstrate that the integration of audio and visual information significantly improves activity recognition performance, outperforming single modality approaches and equally weighted decisions from multiple modalities.
ISSN:0941-0643
1433-3058
DOI:10.1007/s00521-024-10463-0