HuiFang Wang, YaTong Liu, Jiongyao Ye, Dawei Yang, Yu Zhu
Accurate medical image segmentation is crucial for clinical diagnosis and disease treatment. However, there are still great challenges for most existing methods to extract accurate features from medical images because of blurred boundaries and various appearances. To overcome the above limitations, we propose a novel medical image segmentation network named TS-Net that effectively combines the advantages of CNN and Transformer to enhance the feature extraction ability. Specifically, we design a Multi-scale Convolution Modulation (MCM) module to simplify the self-attention mechanism through a convolution modulation strategy that incorporates multi-scale large-kernel convolution into depth-separable convolution, effectively extracting the multi-scale global features and local features. Besides, we adopt the concept of feature complementarity to facilitate the interaction between high-level semantic features and low-level spatial features through the designed Scale Inter-active Attention (SIA) module. The proposed method is evaluated on four different types of medical image segmentation datasets, and the experimental results show its competence with other state-of-the-art methods. The method achieves an average Dice Similarity Coefficient (DSC) of 90.79% ± 1.01% on the public NIH dataset for pancreas segmentation, 76.62% ± 4.34% on the public MSD dataset for pancreatic cancer segmentation, 80.70% ± 6.40% on the private PROMM (Prostate Multi-parametric MRI) dataset for prostate cancer segmentation, and 91.42% ± 0.55% on the public Kvasir-SEG dataset for polyp segmentation. The experimental results across the four different segmentation tasks for medical images demonstrate the effectiveness of the Trans-Scale network.