Loading…

End-to-End Deep Learning Method with Disparity Correction for Stereo Matching

At present, in end-to-end network combined with deep learning in stereo matching, how to perceive ample image details within an acceptable range of computational cost, so as to improve the final matching accuracy, has become a hot research topic. In this paper, we design a cascaded disparity correct...

Full description

Saved in:

Bibliographic Details
Published in:	Arabian journal for science and engineering (2011) 2024-03, Vol.49 (3), p.3331-3345
Main Authors:	Zhou, Zhiyu, Liu, Mingxuan, Guo, Jiusen, Wang, Yaming, Yang, Donghe, Zhu, Zefei
Format:	Article
Language:	English
Subjects:	Algorithms Artificial neural networks Coders Computer networks Computing costs Deep learning Engineering Humanities and Social Sciences Machine learning Matching Modules multidisciplinary Research Article-Computer Engineering and Computer Science Science Trunk networks
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	At present, in end-to-end network combined with deep learning in stereo matching, how to perceive ample image details within an acceptable range of computational cost, so as to improve the final matching accuracy, has become a hot research topic. In this paper, we design a cascaded disparity correction network based on DispNet, which can refine the initial disparity image by combining the process output results in the trunk network. The method proposed in this paper is named ETE–DC–CNN (end to end–disparity correction–convolutional neural network). In the trunk network, the mixed dilated convolution module is employed to extract the features of the original image, and the feature image is constructed into a 3D disparity space volume through depth-separable convolution, after which the initial disparity image can be obtained through 2D encoder–decoder module. In the correction network, the gradient information of the initial feature image is combined to reconstruct the modified matching cost body guided by the initial disparity image and then a smaller encoder–decoder module is trained and processed to make the final result at the sub-pixel level. The algorithm in this paper has been verified in SceneFlow and KITTI 2012/2015 data sets. The method proposed in this article achieves 3.46, 2.12, 1.19, and 0.6 results in the non-occluded area and 4.04, 2.55, 1.58, and 0.7 results in the global area on the E r (> 2 px), E r (> 3 px), E r (> 5 px), and E pe indicators in the KITTI 2012 test set, which are higher than those of the backbone network DispNet.
ISSN:	2193-567X 1319-8025 2191-4281
DOI:	10.1007/s13369-023-07985-5