Loading…

Fused GRU with semantic-temporal attention for video captioning

The encoder-decoder framework has been widely used for video captioning to achieve promising results, and various attention mechanisms are proposed to further improve the performance. While temporal attention determines where to look, semantic decides the context. However, the combination of semanti...

Full description

Saved in:
Bibliographic Details
Published in:Neurocomputing (Amsterdam) 2020-06, Vol.395, p.222-228
Main Authors: Gao, Lianli, Wang, Xuanhan, Song, Jingkuan, Liu, Yang
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The encoder-decoder framework has been widely used for video captioning to achieve promising results, and various attention mechanisms are proposed to further improve the performance. While temporal attention determines where to look, semantic decides the context. However, the combination of semantic and temporal attention has never be exploited for video captioning. To tackle this issue, we propose an end-to-end pipeline named Fused GRU with Semantic-Temporal Attention (STA-FG), which can explicitly incorporate the high-level visual concepts to the generation of semantic-temporal attention for video captioning. The encoder network aims to extract visual features from the videos and predict their semantic concepts, while the decoder network is focusing on efficiently generating coherent sentences using both visual features and semantic concepts. Specifically, the decoder combines both visual and semantic representation, and incorporates a semantic and temporal attention mechanism in a fused GRU network to accurately learn the sentences for video captioning. We experimentally evaluate our approach on the two prevalent datasets MSVD and MSR-VTT, and the results show that our STA-FG achieves the currently best performance on both BLEU and METEOR.
ISSN:0925-2312
1872-8286
DOI:10.1016/j.neucom.2018.06.096