Loading…
Multi-stage cascaded deconvolution for depth map and surface normal prediction from single image
•This paper proposes a fully convolutional deep architecture for predicting depth and surface normal from a single RGB image.•A state-of-the-art CNN, DenseNet, is employed to extract the deep convolutional features from the input RGB images.•Deconvolution is applied at various stages of the network...
Saved in:
Published in: | Pattern recognition letters 2019-11, Vol.127, p.165-173 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •This paper proposes a fully convolutional deep architecture for predicting depth and surface normal from a single RGB image.•A state-of-the-art CNN, DenseNet, is employed to extract the deep convolutional features from the input RGB images.•Deconvolution is applied at various stages of the network to enhance the spatial resolution of intermediate feature maps.•Bottleneck layers are incorporated to acquire some feature maps of same resolution as that of the deconvolution blocks.•Concatenated features are progressed along the deep network in a cascaded manner to yield the target (depth, surface normal).
[Display omitted] Understanding the 3D perspective of a scene is imperative in improving the precision of intelligent autonomous systems. The difficulty in understanding is compounded when only one image of the scene is available at disposal. In this regard, we propose a fully convolutional deep framework for predicting the depth map and surface normal from a single RGB image in a common architecture. The proposed model accepts an input RGB image of size 320 × 240 × 3, and predicts the output via 4 stages: (1) It forwards the input through the convolution layers of DenseNet-161 to extract deep features of size 10 × 7 × 2208, (2) Few deconvolution blocks are appended at the end of the DenseNet in order to enhance the resolution of target output, (3) The output feature maps of the bottleneck layers, which are appended to the dense blocks of the DenseNet to facilitate feature-reuse, are concatenated with same resolution feature maps of deconvolution blocks, (4) These concatenated features are progressed along the network in a cascaded manner to form the final output of resolution 160 × 120.
Understanding the 3D perspective of a scene is imperative in improving the precision of intelligent autonomous systems. The difficulty in understanding is compounded when only one image of the scene is available at disposal. In this regard, we propose a fully convolutional deep framework for predicting the depth map and surface normal from a single RGB image in a common architecture. The DenseNet CNN architecture is employed to learn the complex mapping between an input RGB image and its corresponding 3D primitives. We introduce a novel approach of multi-stage cascaded deconvolution, where the output feature maps of one dense block are reused by concatenating with the feature maps of the corresponding deconvolution block. These combined feature maps are progressed along the |
---|---|
ISSN: | 0167-8655 1872-7344 |
DOI: | 10.1016/j.patrec.2018.07.012 |