Loading…

Multi-scale Motion Feature Integration for Action Recognition

Analyzing video data with intricate temporal structures and extracting comprehensive motion information remains a significant challenge. In this work, we introduce the multi-scale motion feature integration (MMFI) network, which leverages two key modules for motion analysis: the progressive local co...

Full description

Saved in:
Bibliographic Details
Main Authors: Lai, Jinming, Zheng, Huicheng, Dang, Jisheng
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Analyzing video data with intricate temporal structures and extracting comprehensive motion information remains a significant challenge. In this work, we introduce the multi-scale motion feature integration (MMFI) network, which leverages two key modules for motion analysis: the progressive local context aggregation (PLCA) module and the multi-scale motion excitation (MSME) module. The PLCA module captures frame-level motion details by incrementally processing frame-wise differences near the input frame in the early stages of the network. The MSME module provides motion-attentive channel weights in deeper layers with higher dimensions, incorporating short- and long-range segment-level motion information. These modules synergistically capture motion details across various scales. Our approach is evaluated on the large-scale video dataset Something-Something V1, yielding state-of-the-art performance with minimal computational overhead.
ISSN:2837-7109
DOI:10.1109/ICCC59590.2023.10507593