A Simple and Robust Framework for Cross-Modality Medical Image Segmentation applied to Vision Transformers
Centre des Mat´eriaux、Centre de Mise en Forme des Mat´eriaux、Centre de Morphologie Math´ematique
Dual Attention Encoder with Joint Preservation for Medical Image Segmentation
Transformers have recently gained considerable popularity for capturing long-range dependencies in the medical image segmentation. However, most transformer-based segmentation methods primarily focus on modeling global dependencies and fail to fully explore the complementary nature of different dimensional dependencies within features. These methods simply treat the aggregation of multi-dimensional dependencies as auxiliary modules for incorporating context into the Transformer architecture, thereby limiting the model’s capability to learn rich feature representations. To address this issue, we introduce the Dual Attention Encoder with Joint Preservation (DANIE) for medical image segmentation, which synergistically aggregates spatial-channel dependencies across both local and global areas through attention learning. Additionally, we design a lightweight aggregation mechanism, termed Joint Preservation, which learns a composite feature representation, allowing different dependencies to complement each other. Without bells and whistles, our DANIE significantly improves the performance of previous state-of-the-art methods on five popular medical image segmentation benchmarks, including Synapse, ACDC, ISIC 2017, ISIC 2018 and GlaS.
Unet的改进
在DRIVE数据集上的改进效果预估:
Rolling-Unet Revitalizing MLP’s Ability to Efficiently Extract Long-Distance Dependencies for Medical Image Segmentation
Medical image segmentation methods based on deep learning network are mainly divided into CNN and Transformer. However, CNN struggles to capture long-distance dependencies, while Transformer suffers from high computational complexity and poor local feature learning. To efficiently extract and fuse local features and long-range dependencies, this paper proposes Rolling-Unet, which is a CNN model combined with MLP. Specifically, we propose the core R-MLP module, which is responsible for learning the long-distance dependency in a single direction of the whole image. By controlling and combining R-MLP modules in different directions, OR-MLP and DOR-MLP modules are formed to capture long-distance dependencies in multiple directions. Further, Lo2 block is proposed to encode both local context information and long-distance dependencies without excessive computational burden. Lo2 block has the same parameter size and computational complexity as a 3×3 convolution. The experimental results on four public datasets show that Rolling-Unet achieves superior performance compared to the state-of- the-art methods.
C-CAM Causal CAM for Weakly Supervised Semantic Segmentation on Medical Image
Recently, many excellent weakly supervised semantic segmentation (WSSS) works are proposed based on class activation mapping (CAM). However, there are few works that consider the characteristics ofmedical images. In this paper, we find that there are mainly two challenges of medical images in WSSS: i) the boundary of object foreground and background is not clear; ii) the co-occurrence phenomenon is very severe in training stage. We thus propose a Causal CAM (C-CAM) method to overcome the above challenges. Our method is motivated by two cause-effect chains including category-causality chain and anatomy- causality chain. The category-causality chain represents the image content (cause) affects the category (effect). The anatomy-causality chain represents the anatomical structure (cause) affects the organ segmentation (effect). Extensive experiments were conducted on three public medical image data sets. Our C-CAM generates the best pseudo masks with the DSC of 77.26%, 80.34% and 78.15% on ProMRI, ACDC and CHAOS compared with other CAM-like methods. The pseudo masks ofC-CAM are further used to improve the segmentation performance for organ segmentation tasks. Our C-CAM achieves DSC of 83.83% on ProMRI and DSC of87.54% on ACDC, which outperforms state-of-the-art WSSS methods. Our code is available at https://github.com/Tian-lab/C-CAM.
Pixel-Wise Reclassification with Prototypes for Enhancing Weakly Supervised Semantic Segmentation
Refining the seed region to obtain finely annotated pseudo masks for training a segmentation model is a crucial step in the multi-stage weakly supervised semantic segmentation (WSSS) framework. One of the most popular refinement methods, IRN, extends seed regions towards the edges in the image. However, we observed that, due to the lack of guidance from semantic information, IRN’s refinement may lead the generation of partially erroneous refinement directions. To address this issue, we leverage prototypes to recover the overlooked category semantic information in the refinement stage. We propose a prototype-based pseudo mask reclassification post-processing (PtReCl) to correct misclassified pixels in the pseudo masks, generating refined pseudo masks with more accurate coverage. Experimental evaluations demonstrate that our post-processing approach brings improvements in both pseudo mask quality and segmentation results on PASCAL VOC and MS COCO datasets, achieving state-of-the-art performance on VOC.
弱监督语义分割
Snipaste_2025-04-12_19-29-41
Knowledge Transfer with Simulated Inter-Image Erasing for Weakly Supervised Semantic Segmentation
南京理工大学、地平线机器人
SFC Shared Feature Calibration in Weakly Supervised Semantic Segmentation
Image-level weakly supervised semantic segmentation has received increasing attention due to its low annotation cost. Existing methods mainly rely on Class Activation Mapping (CAM) to obtain pseudo-labels for training semantic segmentation models. In this work, we are the first to demonstrate that long-tailed distribution in training data can cause the CAM calculated through classifier weights over-activated for head classes and under-activated for tail classes due to the shared features among head- and tail- classes. This degrades pseudo-label quality and further influences final semantic segmentation performance. To address this issue, we propose a Shared Feature Calibration (SFC) method for CAM generation. Specifically, we leverage the class prototypes that carry positive shared features and propose a Multi-Scaled Distribution-Weighted (MSDW) consistency loss for narrowing the gap between the CAMs generated through classifier weights and class prototypes during training. The MSDW loss counterbalances over-activation and under-activation by calibrating the shared features in head-/tail-class classifier weights. Experimental results show that our SFC significantly improves CAM boundaries and achieves new state-of-the-art performances. The project is available at https://github.com/Barrett-python/SFC.
WeakCLIP Adapting CLIP for Weakly-Supervised Semantic Segmentation
华中科技大学、西北工业大学

目录