Loading…
MFCANet: A road scene segmentation network based on Multi-Scale feature fusion and context information aggregation
•A SPIA module is proposed, which can effectively improve the semantic information extraction and understanding ability.•A context information aggregation network is designed as decoder, in which semantic and spatial information, multi-scale context features are considered.•A PIC module is proposed...
Saved in:
Published in: | Journal of visual communication and image representation 2024-02, Vol.98, p.104055, Article 104055 |
---|---|
Main Authors: | , , , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •A SPIA module is proposed, which can effectively improve the semantic information extraction and understanding ability.•A context information aggregation network is designed as decoder, in which semantic and spatial information, multi-scale context features are considered.•A PIC module is proposed using cross-scale self-attention fusion to model the pixel information globally.•Experimental results on CamVid and Cityscapes datasets demonstrate the superior performance over the state-of-the-arts.
Road scene segmentation is the basic task of autonomous driving. Recent representative scene segmentation methods adopt the full convolutional network based on the encoder-decoder. However, the framework can cause the loss of image fine-grained information in the process of down-sampling, feature extraction and feature fusion, resulting in blurred boundary details and chaotic segmentation effect. In this work, a road scene segmentation network based on multi-scale feature fusion and context information aggregation is proposed, in which context information is used to guide feature fusion and enhance semantic feature extraction. Three plug-and-play modules are designed to extract multi-scale features with strong semantic information from high-level features, which compensate for the loss of spatial information in the upper sampling stage, and capture the information dependence among pixels to improve pixel-by-pixel segmentation. Experimental results on Camvid and Cityscapes show that the proposed multi-scale feature fusion and context information aggregation network (MFCANet) can achieve satisfactory performance compared with the state-of-the-art segmentation methods. |
---|---|
ISSN: | 1047-3203 1095-9076 |
DOI: | 10.1016/j.jvcir.2024.104055 |