Mutual learning with reliable pseudo label for semi-supervised medical image segmentation
Semi-supervised learning has garnered
Cross-Modality Interaction Network for Medical Image Fusion
Multi-modal medical image fusion maximizes the complementary information from diverse modality images by integrating source images. The fused medical image could offer enhanced richness and improved accuracy compared to the source images. Unfortunately, the existing deep learning-based medical image fusion methods generally rely on convolutional operations, which may not effectively capture global information such as spatial relationships or shape features within and across image modalities. To address this problem, we propose a unified AI-Generated Content (AIGC)-based medical image fusion, termed Cross-Modal Interactive Network (CMINet). The CMINet integrates a recursive transformer with an interactive Convolutional Neural Network. Specifically, the recursive transformer is designed to capture extended spatial and temporal dependencies within modalities, while the interactive CNN aims to extract and merge local features across modalities. Benefiting from cross-modality interaction learning, the proposed method can generate fused images with rich structural and functional information. Additionally, the architecture of the recursive network is structured to reduce parameter count, which could be beneficial for deployment on resource-constrained devices. Comprehensive experiments on multi-model medical images (MRI and CT, MRI and PET, and MRI and SPECT) demonstrate that the proposed method outperforms the state-ofthe-art fusion methods subjectively and objectively.
A nested self-supervised learning framework for 3-D semantic segmentation-driven multi-modal medical image fusion
The successful fusion of 3-D multi-modal medical images depends on both specific characteristics unique to each imaging mode as well as consistent spatial semantic features among all modes. However, the inherent variability in the appearance of these images poses a significant challenge to reliable learning of semantic information. To address this issue, this paper proposes a nested self-supervised learning framework for 3-D semantic segmentation-driven multi-modal medical image fusion. The proposed approach utilizes contrastive learning to effectively extract specified multi-scale features from each mode using U-Net (CU-Net). Subsequently, it employs geometric spatial consistency learning through a fusion convolutional decoder (FCD) and a geometric matching network (GMN) to ensure consistent acquisition of semantic representation within the same 3-D regions across multiple modalities. Additionally, a hybrid multi-level loss is introduced to facilitate the learning process of fused images. Ultimately, we leverage optimally specified multi-modal features for fusion and brain tumor lesion segmentation. The proposed approach enables cooperative learning between 3-D fusion and segmentation tasks by employing an innovative nested self-supervised strategy, thereby successfully striking a harmonious balance between semantic consistency and visual specificity during the extraction of multi-modal features. The fusion results demonstrated a mean classification SSIM, PSNR, NMI,and SFR of 0.9310, 27.8861, 1.5403, and 1.0896 respectively. The segmentation results revealed a mean classification Dice, sensitivity (Sen), specificity (Spe), and accuracy (Acc) of 0.8643, 0.8736, 0.9915, and 0.9911 correspondingly. The experimental findings demonstrate that our approach outperforms 11 other state-of-the-art fusion methods and 5 classical U-Net-based segmentation methods in terms of 4 objective metrics and qualitative evaluation. The code of the proposed method is available at https://github.com/ImZhangyYing/NLSF.
Mirror U-Net Marrying Multimodal Fission with Multi-task Learning for Semantic Segmentation in Medical Imaging
Positron Emission Tomography (PET) and Computed To-mography (CT) are routinely used together to detect tumors. PET/CT segmentation models can automate tumor delineation, however, current multimodal models do not fully exploit the complementary information in each modality, as they either concatenate PET and CT data or fuse them at the decision level. To combat this, we propose Mirror U-Net, which replaces traditional fusion methods with multi-modal fission by factorizing the multimodal representation into modality-specific decoder branches and an auxiliary multimodal decoder. At these branches, Mirror U-Net assigns a task tailored to each modality to reinforce unimodal features while preserving multimodal features in the shared representation. In contrast to previous methods that use either fission or multi-task learning, Mirror U-Net combines both paradigms in a unified framework. We explore various task combinations and examine which parameters to share in the model. We evaluate Mirror U-Net on the AutoPET PET/CT and on the multimodal MSD BrainTumor datasets, demonstrating its effectiveness in multimodal segmentation and achieving state-of-the-art performance on both datasets. Code: https://github.com/Zrrr1997
Dual Attention Encoder with Joint Preservation for Medical Image Segmentation
Transformers have recently gained considerable popularity for capturing long-range dependencies in the medical image segmentation. However, most transformer-based segmentation methods primarily focus on modeling global dependencies and fail to fully explore the complementary nature of different dimensional dependencies within features. These methods simply treat the aggregation of multi-dimensional dependencies as auxiliary modules for incorporating context into the Transformer architecture, thereby limiting the model’s capability to learn rich feature representations. To address this issue, we introduce the Dual Attention Encoder with Joint Preservation (DANIE) for medical image segmentation, which synergistically aggregates spatial-channel dependencies across both local and global areas through attention learning. Additionally, we design a lightweight aggregation mechanism, termed Joint Preservation, which learns a composite feature representation, allowing different dependencies to complement each other. Without bells and whistles, our DANIE significantly improves the performance of previous state-of-the-art methods on five popular medical image segmentation benchmarks, including Synapse, ACDC, ISIC 2017, ISIC 2018 and GlaS.

目录