https://doi.org/10.1140/epjs/s11734-025-01792-9
Regular Article
Performance evaluation of classification algorithms and feature selection methods for predicting stroke mortality based on blood test results
1
Graduate School of Health Sciences, Health Informatics M.S. Program, Üsküdar University, 34662, Istanbul, Turkey
2
Faculty of Engineering and Natural Sciences, Software Engineering Department, Üsküdar University, 34662, Istanbul, Turkey
a
kristin.benli@uskudar.edu.tr
Received:
1
April
2025
Accepted:
6
July
2025
Published online:
14
July
2025
Stroke is a major cause of death in both Turkey and worldwide. Healthcare professionals need to have a prediction about the prognosis to guide their treatments. This study aims to establish a classification model that can accurately predict mortality based on routine blood test findings from patients receiving treatment in the neurology intensive care and stroke units at a tertiary city hospital in Istanbul. Additionally, the objective is to determine the blood values that have the greatest impact on mortality prediction. After addressing challenges related to missing data and class imbalance within the dataset, various classification algorithms, including naive bayes, decision tree, random forest, multilayer perceptron, logistic regression, support vector machine, K-nearest neighbor, and repeated incremental pruning to produce error reduction, were employed. Additionally, feature selection algorithms were utilized to identify the optimal feature subset. For each classifier, the results were computed based on (i) full features, (ii) synthetic minority oversampling technique with full features, (iii) correlation-based feature selection, (iv) information gain feature selection, (v) gain ratio feature selection. Random forest method outperformed other classification methods, achieving a high accuracy of 90.09%, along with 90.41% precision, 90.10% recall, and 90.10% F-measure. Feature selection algorithms highlighted neutrophil percentage (NEU%), lymphocyte percentage (LYM%), and basophil percentage (BAS%) as the most important features for predicting stroke mortality. Utilizing these features, random forest model achieved an accuracy of 83.96%. These findings indicate that blood values, especially NEU%, LYM% and BAS% parameters, contribute to mortality prediction with the significant performance of random forest.
Copyright comment Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
© The Author(s), under exclusive licence to EDP Sciences, Springer-Verlag GmbH Germany, part of Springer Nature 2025
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.