Loading…
LiveCap: Live Video Captioning with Sequential Encoding Network
Today, video captioning frameworks are very useful in places such as video surveillance systems. Most of these systems require real-time captioning, however existing video captioning frameworks still have some limitations in live video. Specifically, they require the whole video to describe. In this...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Today, video captioning frameworks are very useful in places such as video surveillance systems. Most of these systems require real-time captioning, however existing video captioning frameworks still have some limitations in live video. Specifically, they require the whole video to describe. In this paper, we propose LiveCap, a framework for generating sentences corresponding to the current scene in real time from live video. LiveCap consists of three modules: sequential encoding network, captioning network, and context gating network. Our framework accumulates context for sequentially given video segments (sequential encoding network) and generates sentences based on it (captioning network). Furthermore, the context gating network controls the flow between the two networks to determine when to generate sentences. We train and test LiveCap on the ActivityNet Captions dataset and verify that LiveCap generates fluent and coherent captions in live video. |
---|---|
ISSN: | 2162-1241 |
DOI: | 10.1109/ICTC55196.2022.9952747 |