Haoyu Guo, Liuliu Shi, Jinlong Liu
Quant Imaging Med Surg . 2024 Dec 5;14(12):8331-8346. doi: 10.21037/qims-24-1022. Epub 2024 Nov 1.
Background: The use of U-Net and its variations has led to significant advancements in medical image segmentation. However, the encoder-decoder structures of these models often lose spatial information during downsampling. Skip connections can help address this issue; however, they may also introduce excessive irrelevant background information. Additionally, medical images display significant scale variations and complex tissue structures, making it challenging for existing models to accurately separate tissues from the background. To address these issues, we developed the Res2Net-ConvFormer-Dilation-UNet (Res2-CD-UNet), a multi-scale feature extraction network for medical image segmentation.
Methods: This study presents a novel U-shaped segmentation network that employs Res2Net as the backbone and incorporates a convolution-style transformer in the encoding stage to enhance global attention. Additionally, a novel channel feature fusion block (CFFB) has been introduced in the skip connection stage to minimize the effects of background noise.
Results: The proposed model was evaluated using publicly available datasets, Synapse and Seg.A.2023. Using the Synapse dataset, the average dice similarity coefficient (DSC) reached 83.92%, which was 1.96% higher than the suboptimal model, and the average Hausdorff distance (HD) was 14.51 mm. Among the eight organs evaluated, optimal results were achieved for four organs. Similarly, using the Seg.A.2023 dataset, the proposed model also achieved the best results with an average DSC of 93.27%.
Conclusions: The results of this study indicate that the proposed model can more accurately segment regions of interest and better extract multi-scale features in medical images than existing deep-learning algorithms.