Loading…

Show Me a Video: A Large-Scale Narrated Video Dataset for Coherent Story Illustration

Illustrating a multi-sentence story with visual content is a significant challenge in multimedia research. While previous works have focused on sequential story-to-visual representations at the image level or representing a single sentence with a video clip, illustrating a long multi-sentence story...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on multimedia 2024-01, Vol.26, p.2456-2466
Main Authors:	Lu, Yu, Ni, Feiyue, Wang, Haofan, Guo, Xiaofeng, Zhu, Linchao, Yang, Zongxin, Song, Ruihua, Cheng, Lele, Yang, Yi
Format:	Article
Language:	English
Subjects:	Atmospheric modeling Clips Coherence Context modeling Datasets Image story visualization Motion pictures Multimedia Semantics Sentences story illustration Task analysis text-to-video retrieval Visualization
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Illustrating a multi-sentence story with visual content is a significant challenge in multimedia research. While previous works have focused on sequential story-to-visual representations at the image level or representing a single sentence with a video clip, illustrating a long multi-sentence story with coherent videos remains an under-explored area. In this paper, we propose the task of video-based story illustration that focuses on the goal of visually illustrating a story with retrieved video clips. To support this task, we first create a large-scale dataset of coherent video stories in each sample, consisting of 85 K narrative stories with 60 pairs of consistent clips and texts. We then propose the Story Context-Enhanced Model, which leverages local and global contextual information within the story, inspired by sequence modeling in language understanding. Through comprehensive quantitative experiments, we demonstrate the effectiveness of our baseline model. In addition, qualitative results and detailed user studies reveal that our method can retrieve coherent video sequences from stories.
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2023.3296944