Jinghua Zhu, Chengying Huang, Heran Xi, Hui Cui
Neural Netw . 2025 Mar 26:188:107415. doi: 10.1016/j.neunet.2025.107415. Online ahead of print.
Transformers have shown great potential in vision tasks such as semantic segmentation. However, most of the existing transformer-based segmentation models neglect the cross-attention between pixel features and class features which impedes the application of transformers. Inspired by the concept of object queries in k-means Mask Transformer, we develop cluster learning and contrastive cluster assignment (CCA) for medical image segmentation in this paper. The cluster learning leverages the object queries to fit the feature-level cluster centers. The contrastive cluster assignment is introduced to guide the pixel class prediction using the cluster centers. Our method is a plug-in and can be integrated into any model. We design two networks for supervised segmentation tasks and semi-supervised segmentation tasks respectively. We equip the decoder with our proposed modules for the supervised segmentation to improve the pixel-level predictions. For the semi-supervised segmentation, we enhance the feature extraction capability of the encoder by using our proposed modules. We conduct comprehensive comparison and ablation experiments on public medical image datasets (ACDC, LA, Synapse, and ISIC2018), the results demonstrate that our proposed models outperform state-of-the-art models consistently, validating the effectiveness of our proposed method. The source code is accessible at https://github.com/zhujinghua1234/CCA-Seg.