Loading…

The Right Spin: Learning Object Motion from Rotation-Compensated Flow Fields

A good understanding of geometrical concepts as well as a broad familiarity with objects lead to excellent human perception of moving objects. The human ability to detect and segment moving objects works in the presence of multiple objects, complex background geometry, motion of the observer and eve...

Full description

Saved in:
Bibliographic Details
Published in:International journal of computer vision 2024, Vol.132 (1), p.40-55
Main Authors: Bideau, Pia, Learned-Miller, Erik, Schmid, Cordelia, Alahari, Karteek
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A good understanding of geometrical concepts as well as a broad familiarity with objects lead to excellent human perception of moving objects. The human ability to detect and segment moving objects works in the presence of multiple objects, complex background geometry, motion of the observer and even camouflage. How we perceive moving objects so reliably is a longstanding research question in computer vision and borrows findings from related areas such as psychology, cognitive science and physics. One approach to the problem is to teach a deep network to model all of these effects. This is in contrast with the strategy used by human vision, where cognitive processes and body design are tightly coupled and each is responsible for certain aspects of correctly identifying moving objects. Similarly, from the computer vision perspective there is evidence that classical, geometry-based techniques are better suited to the “motion-based” parts of the problem, while deep networks are more suitable for modeling appearance. In this work, we argue that the coupling of camera rotation and camera translation can create complex motion fields that are difficult for a deep network to untangle directly. We present a novel probabilistic model to estimate the camera’s rotation given the motion field. We then rectify the flow field to obtain a rotation-compensated motion field for subsequent segmentation. This strategy of first estimating camera motion, and then allowing a network to learn the remaining parts of the problem, yields improved results on the widely used DAVIS benchmark as well as the more recent motion segmentation data set MoCA (Moving Camouflaged Animals).
ISSN:0920-5691
1573-1405
DOI:10.1007/s11263-023-01859-x