Loading…
End-to-End Deep Learning Method with Disparity Correction for Stereo Matching
At present, in end-to-end network combined with deep learning in stereo matching, how to perceive ample image details within an acceptable range of computational cost, so as to improve the final matching accuracy, has become a hot research topic. In this paper, we design a cascaded disparity correct...
Saved in:
Published in: | Arabian journal for science and engineering (2011) 2024-03, Vol.49 (3), p.3331-3345 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | At present, in end-to-end network combined with deep learning in stereo matching, how to perceive ample image details within an acceptable range of computational cost, so as to improve the final matching accuracy, has become a hot research topic. In this paper, we design a cascaded disparity correction network based on DispNet, which can refine the initial disparity image by combining the process output results in the trunk network. The method proposed in this paper is named ETE–DC–CNN (end to end–disparity correction–convolutional neural network). In the trunk network, the mixed dilated convolution module is employed to extract the features of the original image, and the feature image is constructed into a 3D disparity space volume through depth-separable convolution, after which the initial disparity image can be obtained through 2D encoder–decoder module. In the correction network, the gradient information of the initial feature image is combined to reconstruct the modified matching cost body guided by the initial disparity image and then a smaller encoder–decoder module is trained and processed to make the final result at the sub-pixel level. The algorithm in this paper has been verified in SceneFlow and KITTI 2012/2015 data sets. The method proposed in this article achieves 3.46, 2.12, 1.19, and 0.6 results in the non-occluded area and 4.04, 2.55, 1.58, and 0.7 results in the global area on the
E
r
(> 2 px),
E
r
(> 3 px),
E
r
(> 5 px), and
E
pe
indicators in the KITTI 2012 test set, which are higher than those of the backbone network DispNet. |
---|---|
ISSN: | 2193-567X 1319-8025 2191-4281 |
DOI: | 10.1007/s13369-023-07985-5 |