Loading…

End-to-end Contextual Perception and Prediction with Interaction Transformer

In this paper, we tackle the problem of detecting objects in 3D and forecasting their future motion in the context of self-driving. Towards this goal, we design a novel approach that explicitly takes into account the interactions between actors. To capture their spatial-temporal dependencies, we pro...

Full description

Saved in:

Bibliographic Details
Main Authors:	Li, Lingyun Luke, Yang, Bin, Liang, Ming, Zeng, Wenyuan, Ren, Mengye, Segal, Sean, Urtasun, Raquel
Format:	Conference Proceeding
Language:	English
Subjects:	Forecasting Intelligent robots Planning Real-time systems Recurrent neural networks Three-dimensional displays Trajectory
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In this paper, we tackle the problem of detecting objects in 3D and forecasting their future motion in the context of self-driving. Towards this goal, we design a novel approach that explicitly takes into account the interactions between actors. To capture their spatial-temporal dependencies, we propose a recurrent neural network with a novel Transformer [1] architecture, which we call the Interaction Transformer. Importantly, our model can be trained end-to-end, and runs in real-time. We validate our approach on two challenging real-world datasets: ATG4D [2] and nuScenes [3]. We show that our approach can outperform the state-of-the-art on both datasets. In particular, we significantly improve the social compliance between the estimated future trajectories, resulting in far fewer collisions between the predicted actors.
ISSN:	2153-0866
DOI:	10.1109/IROS45743.2020.9341392