Loading…

A stereo matching algorithm based on the improved PSMNet

Deep learning based on a convolutional neural network (CNN) has been successfully applied to stereo matching. Compared with the traditional method, the speed and accuracy of this method have been greatly improved. However, the existing stereo matching framework based on a CNN often encounters two pr...

Full description

Saved in:

Bibliographic Details
Published in:	PloS one 2021-08, Vol.16 (8), p.e0251657-e0251657
Main Authors:	Huang, Zedong, Gu, Jinan, Li, Jing, Yu, Xuefei
Format:	Article
Language:	English
Subjects:	Accuracy Algorithms Artificial neural networks Biology and Life Sciences Computer and Information Sciences Costs Databases, Factual Deep learning Feature extraction Humans Ill posed problems Image processing Image Processing, Computer-Assisted Information processing Machine learning Machine vision Mechanical engineering Methods Model matching Models, Theoretical Modules Neural networks Neural Networks, Computer Parallax Physical Sciences Research and Analysis Methods Semantics Software Teaching methods Texture
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Deep learning based on a convolutional neural network (CNN) has been successfully applied to stereo matching. Compared with the traditional method, the speed and accuracy of this method have been greatly improved. However, the existing stereo matching framework based on a CNN often encounters two problems. First, the existing stereo matching network has many parameters, which leads to the matching running time being too long. Second, the disparity estimation is inadequate in some regions where reflections, repeated textures, and fine structures may lead to ill-posed problems. Through the lightweight improvement of the PSMNet (Pyramid Stereo Matching Network) model, the common matching effect of ill-conditioned areas such as repeated texture areas and weak texture areas is solved. In the feature extraction part, ResNeXt is introduced to learn unitary feature extraction, and the ASPP (Atrous Spatial Pyramid Pooling) module is trained to extract multiscale spatial feature information. The feature fusion module is designed to effectively fuse the feature information of different scales to construct the matching cost volume. The improved 3D CNN uses the stacked encoding and decoding structure to further regularize the matching cost volume and obtain the corresponding relationship between feature points under different parallax conditions. Finally, the disparity map is obtained by a regression. We evaluate our method on the Scene Flow, KITTI 2012, and KITTI 2015 stereo datasets. The experiments show that the proposed stereo matching network achieves a comparable prediction accuracy and much faster running speed compared with PSMNet.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0251657