https://doi.org/10.1140/epjs/s11734-025-01780-z
Regular Article
Hybrid deep learning approach using U-Net with attention gates for colorectal cancer segmentation
1
School of Computer Science and Engineering, Vellore Institute of Technology (VIT), 600127, Chennai, India
2
Centre for Cyber Physical Systems, Vellore Institute of Technology (VIT), 600127, Chennai, India
Received:
19
April
2025
Accepted:
27
June
2025
Published online:
8
July
2025
Colorectal cancer (CRC) is a leading cause of cancer-related deaths, and effective polyp segmentation is key for early CRC diagnosis and treatment. This work proposes a deep learning image segmentation framework incorporating a U-Net model with attention gates (AGs) and systematically evaluates the impact of varying encoder architectures, including CNN-based (DenseNet, DeepLabV3, ConvNeXt), Transformer-based (ViT, Swin), and hybrid CNN–Transformer models (MaxViT, SegFormer, EdgeNext), on colorectal cancer polyp segmentation. The novelty lies in this comprehensive comparative analysis across diverse architectural paradigms using AGs for feature refinement on three publicly available datasets, providing insights into optimal encoder choices for this specific medical imaging task. The three publicly available datasets are Kvasir-SEG, CVC-ClinicDB, and CVC-ColonDB. The Dice coefficient, mean intersection over union (mIoU) score, and ROC analysis were utilized to assess performance. Overall, ConvNeXt-U-Net with AGs scored the highest on the segmented images, with a Dice score of 0.9116 and a mIoU score of 0.8522. The ROC analysis results bolster confidence in its accuracy, earning an AUC of 0.99. The loss plots demonstrate effective model training, and model selection based on validation performance ensures good generalization for the reported metrics, though some architectures show potential for overfitting in later training epochs. Our results suggest that hybrid CNN–Transformer architectures can exploit advantages for local and global features so that they are particularly beneficial for medical image segmentation. This research emphasizes the potential of advanced deep learning methods to assist in the detection of CRCs and planning of treatment.
Copyright comment Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
© The Author(s), under exclusive licence to EDP Sciences, Springer-Verlag GmbH Germany, part of Springer Nature 2025
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.