Zhenghua Xu, Hening Wang, Runhe Yang, Yuchen Yang, Weipeng Liu, Thomas Lukasiewicz
Recent Advances show that both Convolutional layers and Transformer blocks have their own advantages in the feature learning tasks of medical image analysis. However, the existing models combining both CNN and Transformers can not effectively integrate the features extracted by both networks. In this work, we propose a new semi-supervised medical image segmentation method which can effectively aggregate mutual learning between CNN and Transformer, denoted AML-CT, which consists of an auxiliary module and a main network. Specifically, the auxiliary module consists of two segmentation subnetworks based on CNN and Transformer, aiming at extracting features from different perspectives, where, to enhance integration of image features from distinct segmentation networks, a Cross-Branch Feature Fusion module is proposed to effectively fuses local and global information via internal cross-fusion of feature maps between networks. Then, to aggregate the extracted image features from the auxiliary module, a three-branch network (TB-net) structure is further proposed to learn the extracted joint features and facilitate aggregation of multi-source information. Experimental results on two public datasets demonstrate that: (i) AML-CT successfully accomplishes medical image segmentation tasks with limited labeled data, outperforming recent mainstream semi-supervised segmentation methods; (ii) Ablation studies confirm the effectiveness of each module in the AML-CT model for performance improvement.