Loading…

An End-to-End Future Frame Prediction Method for Vehicle-Centric Driving Videos

In the field of autonomous driving, training an agent to watch and think similar to human drivers is an efficient way to solve self-driving problems. Inspired by NVIDIA's frame-level command generation task [2] and the discovery of humans memory capacity [13], we propose a future frame predicti...

Full description

Saved in:
Bibliographic Details
Main Authors: Du, Li, Ji, Kaikun, Zhao, Zhicheng, Su, Fei, Zhuang, Bojin
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In the field of autonomous driving, training an agent to watch and think similar to human drivers is an efficient way to solve self-driving problems. Inspired by NVIDIA's frame-level command generation task [2] and the discovery of humans memory capacity [13], we propose a future frame prediction method for vehicle-centric driving videos. An end-to-end deep learning architecture called future frame prediction (FFPRE) network is proposed, which can generate a future frame following the input video sequence. In particular, we develop a general memory preserving module to extract meaningful history information from input data. This module consists of two parts, namely, memory recall and memory refine. We train this module to generate the short-term spatiotemporal information of a given video batch, which is a concatenation of history appearance and temporal clues. Thereafter, the two history clues will be transformed into future representations by a long-term prediction module. Thus, humans' driving prediction progress is mimicked in a completely modular manner. Given the FFPRE network's effective long-short spatiotemporal feature learning ability, the proposed network can construct an internal representation (content and dynamic) of vehicle-centric driving videos without tracking the trajectory of every pixel. Experimental results on publicly released datasets of NVIDIA and DR(eye)VE indicate that our proposed method is efficient.
ISSN:2642-9357
DOI:10.1109/VCIP47243.2019.8965824