Loading…

Darkness-Adaptive Action Recognition: Leveraging Efficient Tubelet Slow-Fast Network for Industrial Applications

Infrared (IR) technology has emerged as a solution for monitoring dark environments. It offers resilience to shifting illumination, appearance changes, and shadows, with applications spanning self-driving cars, robotics, nighttime security, and many other fields. While existing state-of-the-art RGB-...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on industrial informatics 2024-12, Vol.20 (12), p.13676-13686
Main Authors:	Munsif, Muhammad, Khan, Noman, Hussain, Altaf, Kim, Min Je, Baik, Sung Wook
Format:	Article
Language:	English
Subjects:	Accuracy Action recognition (AR) Autonomous cars Computational modeling Computer architecture Computer vision Dark adaptation Darkness Feature extraction Heuristic algorithms Human activity recognition Industrial applications Industry applications infrared (IR) vision Infrared imaging Large scale integration large-scale optimization Modules Robotics State of the art
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Infrared (IR) technology has emerged as a solution for monitoring dark environments. It offers resilience to shifting illumination, appearance changes, and shadows, with applications spanning self-driving cars, robotics, nighttime security, and many other fields. While existing state-of-the-art RGB-based human action recognition (AR) models exhibit limitations in scalability for action understanding under uncertain, low-light, or dark conditions. Integrating these with IR data faces challenges due to changes in modality, high resource demands, and strict latency requirements. Such issues hinder the deployment of these technologies in real-world settings. To overcome these challenges, we introduce a novel slow-fast tubelet (SFT) processing framework designed for efficient and accurate AR in IR-based scenarios. The SFT framework comprises three modules: tubelet preprocessing (TPP), feature extraction, and the feature lateral connection and recognition module (FELCM). The TPP module refines IR streams by extracting the region of interest, filtering detected objects, removing noise, and generating tubelets of refined frames. The FELCM processes refined tubelet through two pathways, where the fast tubelet path operates at a high rate and the slow tubelet path operates at a slow rate. These pathways interconnect through lateral connections, facilitating mutual updates, and enhancing the prediction efficiency. We conducted extensive experiments on benchmark datasets, including NTURGB-D 120 and infrared action recognition (InfAR). The results demonstrate that our proposed SFT framework surpasses state-of-the-art approaches in terms of accuracy (2.7% and 3.3% improvement, respectively), computational cost, and inference latency while maintaining the competitive recognition performance. Our framework's promising results underscore its potential for direct deployment in real-world applications.
ISSN:	1551-3203 1941-0050
DOI:	10.1109/TII.2024.3431070