Hybrid deep learning approach using U-Net with attention gates for colorectal cancer segmentation

Harsha Jackson; Nalin Iyer; Ananthakrishnan Balasundaram; Ayesha Shaik

doi:10.1140/epjs/s11734-025-01780-z

2024 Impact factor 2.3

Special Topics

Eur. Phys. J. Spec. Top.
https://doi.org/10.1140/epjs/s11734-025-01780-z

Regular Article

Hybrid deep learning approach using U-Net with attention gates for colorectal cancer segmentation

Harsha Jackson¹, Nalin Iyer¹, Ananthakrishnan Balasundaram²^a and Ayesha Shaik²

¹ School of Computer Science and Engineering, Vellore Institute of Technology (VIT), 600127, Chennai, India
² Centre for Cyber Physical Systems, Vellore Institute of Technology (VIT), 600127, Chennai, India

^a balasundaram.a@vit.ac.in

Received: 19 April 2025
Accepted: 27 June 2025
Published online: 8 July 2025

Abstract

Colorectal cancer (CRC) is a leading cause of cancer-related deaths, and effective polyp segmentation is key for early CRC diagnosis and treatment. This work proposes a deep learning image segmentation framework incorporating a U-Net model with attention gates (AGs) and systematically evaluates the impact of varying encoder architectures, including CNN-based (DenseNet, DeepLabV3, ConvNeXt), Transformer-based (ViT, Swin), and hybrid CNN–Transformer models (MaxViT, SegFormer, EdgeNext), on colorectal cancer polyp segmentation. The novelty lies in this comprehensive comparative analysis across diverse architectural paradigms using AGs for feature refinement on three publicly available datasets, providing insights into optimal encoder choices for this specific medical imaging task. The three publicly available datasets are Kvasir-SEG, CVC-ClinicDB, and CVC-ColonDB. The Dice coefficient, mean intersection over union (mIoU) score, and ROC analysis were utilized to assess performance. Overall, ConvNeXt-U-Net with AGs scored the highest on the segmented images, with a Dice score of 0.9116 and a mIoU score of 0.8522. The ROC analysis results bolster confidence in its accuracy, earning an AUC of 0.99. The loss plots demonstrate effective model training, and model selection based on validation performance ensures good generalization for the reported metrics, though some architectures show potential for overfitting in later training epochs. Our results suggest that hybrid CNN–Transformer architectures can exploit advantages for local and global features so that they are particularly beneficial for medical image segmentation. This research emphasizes the potential of advanced deep learning methods to assist in the detection of CRCs and planning of treatment.

Copyright comment Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Conference announcements

12 Internat. Congress of the Balkan Physical Union
July 8-12, 2025
Bucharest, Romania

Joint Annual Meeting of ÖPG and SPS
August 18-22, 2025
Wien, Austria

111th Italian National Society Congress
September 22-26, 2025
Palermo, Italy

EPJ

Hybrid deep learning approach using U-Net with attention gates for colorectal cancer segmentation

Conference announcements