Loading…

Multi-stage cascaded deconvolution for depth map and surface normal prediction from single image

•This paper proposes a fully convolutional deep architecture for predicting depth and surface normal from a single RGB image.•A state-of-the-art CNN, DenseNet, is employed to extract the deep convolutional features from the input RGB images.•Deconvolution is applied at various stages of the network...

Full description

Saved in:

Bibliographic Details
Published in:	Pattern recognition letters 2019-11, Vol.127, p.165-173
Main Authors:	Padhy, Ram Prasad, Chang, Xiaojun, Choudhury, Suman Kumar, Sa, Pankaj Kumar, Bakshi, Sambit
Format:	Article
Language:	English
Subjects:	Architecture CNN Deconvolution Depth map Feature maps Mapping Multi-stage Predictions Scene understanding Surface normal
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	•This paper proposes a fully convolutional deep architecture for predicting depth and surface normal from a single RGB image.•A state-of-the-art CNN, DenseNet, is employed to extract the deep convolutional features from the input RGB images.•Deconvolution is applied at various stages of the network to enhance the spatial resolution of intermediate feature maps.•Bottleneck layers are incorporated to acquire some feature maps of same resolution as that of the deconvolution blocks.•Concatenated features are progressed along the deep network in a cascaded manner to yield the target (depth, surface normal). [Display omitted] Understanding the 3D perspective of a scene is imperative in improving the precision of intelligent autonomous systems. The difficulty in understanding is compounded when only one image of the scene is available at disposal. In this regard, we propose a fully convolutional deep framework for predicting the depth map and surface normal from a single RGB image in a common architecture. The proposed model accepts an input RGB image of size 320 × 240 × 3, and predicts the output via 4 stages: (1) It forwards the input through the convolution layers of DenseNet-161 to extract deep features of size 10 × 7 × 2208, (2) Few deconvolution blocks are appended at the end of the DenseNet in order to enhance the resolution of target output, (3) The output feature maps of the bottleneck layers, which are appended to the dense blocks of the DenseNet to facilitate feature-reuse, are concatenated with same resolution feature maps of deconvolution blocks, (4) These concatenated features are progressed along the network in a cascaded manner to form the final output of resolution 160 × 120. Understanding the 3D perspective of a scene is imperative in improving the precision of intelligent autonomous systems. The difficulty in understanding is compounded when only one image of the scene is available at disposal. In this regard, we propose a fully convolutional deep framework for predicting the depth map and surface normal from a single RGB image in a common architecture. The DenseNet CNN architecture is employed to learn the complex mapping between an input RGB image and its corresponding 3D primitives. We introduce a novel approach of multi-stage cascaded deconvolution, where the output feature maps of one dense block are reused by concatenating with the feature maps of the corresponding deconvolution block. These combined feature maps are progressed along the
ISSN:	0167-8655 1872-7344
DOI:	10.1016/j.patrec.2018.07.012