Loading…

Progressive Localization Networks for Language-Based Moment Localization

This article targets the task of language-based video moment localization. The language-based setting of this task allows for an open set of target activities, resulting in a large variation of the temporal lengths of video moments. Most existing methods prefer to first sample sufficient candidate m...

Full description

Saved in:
Bibliographic Details
Published in:ACM transactions on multimedia computing communications and applications 2023-02, Vol.19 (2), p.1-21, Article 55
Main Authors: Zheng, Qi, Dong, Jianfeng, Qu, Xiaoye, Yang, Xun, Wang, Yabing, Zhou, Pan, Liu, Baolong, Wang, Xun
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This article targets the task of language-based video moment localization. The language-based setting of this task allows for an open set of target activities, resulting in a large variation of the temporal lengths of video moments. Most existing methods prefer to first sample sufficient candidate moments with various temporal lengths, then match them with the given query to determine the target moment. However, candidate moments generated with a fixed temporal granularity may be suboptimal to handle the large variation in moment lengths. To this end, we propose a novel multi-stage Progressive Localization Network (PLN) that progressively localizes the target moment in a coarse-to-fine manner. Specifically, each stage of PLN has a localization branch and focuses on candidate moments that are generated with a specific temporal granularity. The temporal granularities of candidate moments are different across the stages. Moreover, we devise a conditional feature manipulation module and an upsampling connection to bridge the multiple localization branches. In this fashion, the later stages are able to absorb the previously learned information, thus facilitating the more fine-grained localization. Extensive experiments on three public datasets demonstrate the effectiveness of our proposed PLN for language-based moment localization, especially for localizing short moments in long videos.
ISSN:1551-6857
1551-6865
DOI:10.1145/3543857