Loading…

Learning discriminative foreground-and-background features for few-shot segmentation

Few-shot Semantic Segmentation (FSS) endeavors to segment novel categories in a query image by referring to a support set comprising only a few annotated examples. Presently, many existing FSS methodologies primarily embrace the prototype learning paradigm and concentrate on optimizing the matching...

Full description

Saved in:
Bibliographic Details
Published in:Multimedia tools and applications 2024-05, Vol.83 (18), p.55999-56019
Main Authors: Jiang, Cong, Zhou, Yange, Liu, Zhaoshuo, Feng, Chaolu, Li, Wei, Yang, Jinzhu
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Few-shot Semantic Segmentation (FSS) endeavors to segment novel categories in a query image by referring to a support set comprising only a few annotated examples. Presently, many existing FSS methodologies primarily embrace the prototype learning paradigm and concentrate on optimizing the matching mechanism. However, these approaches tend to overlook the discrimination between the features of foreground background. Consequently, the segmentation results are often imprecise when it comes to capturing intricate structures, such as boundaries and small objects. In this study, we introduce the D iscriminative F oreground-and- B ackground feature learning Net work (DFBNet) to enhance the distinguishability of bilateral features. DFBNet comprises three major modules: a multi-level self-matching module (MSM), a feature separation module (FSM), and a semantic alignment module (SAM). The MSM generates prior masks separately for the foreground and background, employing a self-matching strategy across different feature levels. These prior masks are subsequently used as scaling factors within the FSM, where the features of the query’s foreground and background are independently scaled up and then concatenated along the channel dimension. Furthermore, we incorporate a two-layer Transformer encoder-based semantic alignment module (SAM) in DFBNet to refine the features, thereby creating a greater distinction between the foreground and background features. The performance of DFBNet is evaluated on the PASCAL- 5 i and COCO- 20 i benchmarks, demonstrating its superiority over existing solutions and establishing new state-of-the-art results in the field of few-shot semantic segmentation. The codes will be released if this paper is accepted.
ISSN:1573-7721
1380-7501
1573-7721
DOI:10.1007/s11042-023-17708-5