Loading…

Adaptive Patch Contrast for Weakly Supervised Semantic Segmentation

Weakly Supervised Semantic Segmentation (WSSS), using only image-level labels, has garnered significant attention due to its cost-effectiveness. Typically, the framework involves using image-level labels as training data to generate pixel-level pseudo-labels with refinements. Recently, methods based...

Full description

Saved in:

Bibliographic Details
Published in:	Engineering applications of artificial intelligence 2025-01, Vol.139, p.109626, Article 109626
Main Authors:	Wu, Wangyu, Dai, Tianhong, Chen, Zhenhong, Huang, Xiaowei, Xiao, Jimin, Ma, Fei, Ouyang, Renrong
Format:	Article
Language:	English
Subjects:	Contrastive learning Semantic segmentation Vision Transformer Weakly supervised learning
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Weakly Supervised Semantic Segmentation (WSSS), using only image-level labels, has garnered significant attention due to its cost-effectiveness. Typically, the framework involves using image-level labels as training data to generate pixel-level pseudo-labels with refinements. Recently, methods based on Vision Transformers (ViT) have demonstrated superior capabilities in generating reliable pseudo-labels, particularly in recognizing complete object regions. However, current ViT-based approaches have some limitations in the use of patch embeddings, being prone to being dominated by certain abnormal patches, as well as many multi-stage methods being time-consuming and lengthy in training, thus lacking efficiency. Therefore, in this paper, we introduce a novel ViT-based WSSS method named Adaptive Patch Contrast (APC) that significantly enhances patch embedding learning for improved segmentation effectiveness. APC utilizes an Adaptive-K Pooling (AKP) layer to address the limitations of previous max pooling selection methods. Additionally, we propose a Patch Contrastive Learning (PCL) to enhance patch embeddings, thereby further improving the final results. We developed an end-to-end single-stage framework without CAM, which improved training efficiency. Experimental results demonstrate that our method performs exceptionally well on public datasets, outperforming other state-of-the-art WSSS methods with a shorter training time. •We propose Adaptive K Pooling to reduce outlier impact and improve segmentation.•We introduce Patch Contrastive Learning to improve intra-class compactness and label quality.•We propose an end-to-end ViT framework for WSSS, boosting efficiency without CAM.
ISSN:	0952-1976
DOI:	10.1016/j.engappai.2024.109626