Loading…

LiveCap: Live Video Captioning with Sequential Encoding Network

Today, video captioning frameworks are very useful in places such as video surveillance systems. Most of these systems require real-time captioning, however existing video captioning frameworks still have some limitations in live video. Specifically, they require the whole video to describe. In this...

Full description

Saved in:
Bibliographic Details
Main Authors: Choi, Wangyu, Yoon, Jongwon
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Today, video captioning frameworks are very useful in places such as video surveillance systems. Most of these systems require real-time captioning, however existing video captioning frameworks still have some limitations in live video. Specifically, they require the whole video to describe. In this paper, we propose LiveCap, a framework for generating sentences corresponding to the current scene in real time from live video. LiveCap consists of three modules: sequential encoding network, captioning network, and context gating network. Our framework accumulates context for sequentially given video segments (sequential encoding network) and generates sentences based on it (captioning network). Furthermore, the context gating network controls the flow between the two networks to determine when to generate sentences. We train and test LiveCap on the ActivityNet Captions dataset and verify that LiveCap generates fluent and coherent captions in live video.
ISSN:2162-1241
DOI:10.1109/ICTC55196.2022.9952747