Loading…

Real-time semantic segmentation with local spatial pixel adjustment

The research of semantic segmentation networks has achieved a significant breakthrough recently. However, most part of methods have difficulty in utilizing information generated at each stage, which resulting in pixel value dislocation and blurred boundaries for small-scale objects. To overcome thes...

Full description

Saved in:
Bibliographic Details
Published in:Image and vision computing 2022-07, Vol.123, p.104470, Article 104470
Main Authors: Xiao, Cunjun, Hao, Xingjun, Li, Haibin, Li, Yaqian, Zhang, Wenming
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The research of semantic segmentation networks has achieved a significant breakthrough recently. However, most part of methods have difficulty in utilizing information generated at each stage, which resulting in pixel value dislocation and blurred boundaries for small-scale objects. To overcome these challenges, a local spatial pixel adjustment network (LSPANet) is proposed in this paper, which mainly consists of a dual-branch decoding fusion (DDF) module and a spatial pixel cross-correlation (SPCC) block. Specifically, the DDF module takes the high-level and low-level feature maps with different stages as the input, and gradually eliminates the discrepancy in the information of the feature map to fuse a variety of information extracted in the encoder stage. The SPCC block adopts the horizontal spatial pixel adjustment (HSPA) module and the vertical spatial pixel adjustment (VSPA) module to capture the relationship of each pixel value in the local horizontal and vertical space respectively, and then assign the importance to all values based on this relationship. LSPANet is evaluated on Cityscapes and Camvid datasets. The experimental results show that our network achieves 77.1% mIoU with 2 M parameters on the challenging Cityscapes dataset and the inference speed exceeds 30 FPS in a single GTX 2080 Ti GPU. •Present dual-branch decoding fusion module to fuse a variety of information.•Propose spatial pixel cross-correlation block to capture relationship in local space.•Design a local spatial pixel adjustment network for real-time semantic segmentation.
ISSN:0262-8856
1872-8138
DOI:10.1016/j.imavis.2022.104470