https://doi.org/10.1140/epjs/s11734-025-01720-x
Regular Article
Predictive analytics for thyroid cancer recurrence: a feature selection and data balancing approach
1
Department of Computer Engineering, Zonguldak Bulent Ecevit University, 67100, Zonguldak, Turkey
2
Department of Electrical and Electronics Engineering, Zonguldak Bulent Ecevit University, 67100, Zonguldak, Turkey
Received:
13
February
2025
Accepted:
28
May
2025
Published online:
19
June
2025
Thyroid cancer recurrence presents considerable challenges in clinical practice, underscoring the need for accurate predictive models to guide timely interventions. This study introduces a hybrid machine learning (ML) framework that combines data balancing and feature selection to enhance recurrence prediction. Utilizing the Differentiated Thyroid Cancer Recurrence dataset, the framework evaluates the performance of nine distinct ML classifiers through an 80:20 stratified train-test split and stratified 5-fold cross-validation. Among the evaluated models, ensemble methods—particularly Random Forest and Bagging—demonstrate superior performance on SMOTE-balanced data, achieving accuracy and
recall, and outperforming previously reported methods. Statistical analyses further confirm that the impact of feature selection techniques varies depending on classifier architecture. Overall, the proposed framework illustrates that combining data balancing with informed feature selection significantly enhances predictive performance and contributes to the development of reliable decision-support systems for the early detection of thyroid cancer recurrence. The framework’s interpretability and robustness underscore its potential for integration into clinical decision-support systems, enabling early recurrence detection and facilitating personalized treatment strategies. These findings are also applicable to other imbalanced medical datasets.
© The Author(s) 2025
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.