https://doi.org/10.1140/epjs/s11734-024-01201-7
Regular Article
When climate variables improve the dengue forecasting: a machine learning approach
1
Federal University of Paraná, 81531-980, Curitiba, PR, Brazil
2
Potsdam Institute for Climate Impact Research, Telegrafenberg A31, 14473, Potsdam, Germany
3
Department of Physics, Humboldt University Berlin, Newtonstraße 15, 12489, Berlin, Germany
4
Graduate Program in Science, State University of Ponta Grossa, 84030-900, Ponta Grossa, PR, Brazil
5
Institute of Physics, University of São Paulo, 05508-090, São Paulo, SP, Brazil
6
University Center UNIFATEB, 84266-010, Telêmaco Borba, PR, Brazil
7
Department of Mathematics and Statistics, State University of Ponta Grossa, 84030-900, Ponta Grossa, PR, Brazil
Received:
4
April
2024
Accepted:
5
June
2024
Published online:
17
June
2024
Dengue is a viral vector-borne infectious disease that affects many countries worldwide, infecting around 390 million people per year. The main outbreaks occur in subtropical and tropical countries. We, therefore, study here the influence of climate on dengue. In particular, we consider dengue and meteorological data from Natal (2016–2019), Brazil, Iquitos (2001–2012), Peru, and Barranquilla (2011–2016), Colombia. For the analysis and simulations, we apply machine learning (ML) techniques, especially the random forest (RF) algorithm. We utilize dengue disease cases and climate data delayed by up to one week to forecast the cases of dengue. In addition, regarding as feature in the ML technique, we analyze three possibilities: only dengue cases (D); climate and dengue cases (CD); humidity and dengue cases (HD). Depending on the city, our results show that the climate data can improve or not the forecast. For instance, for Natal, the case D induces a better forecast. For Iquitos, it is better to use all the climate variables. Nonetheless, for Barranquilla, the forecast is better, when we include cases and humidity data. Another important result is that each city has an optimal region based on the training length. For Natal, when we use more than 64% and less than 80% of the time series for training, we obtain results with correlation coefficients (r) among 0.917 and 0.949 and mean absolute errors (MAE) among 57.783 and 71.768 for the D case in forecasting. The optimal range for Iquitos is obtained when 79% up to 88% of the time series is considered for training. For this case, the best case is CD, having a minimum r equal to 0.850 and maximum 0.887, while values of MAE oscillate among 2.780 and 4.156. For Barranquilla, the optimal range occurs between 72% until 82% of length training. In this case, the better approach is HD, where the measures exhibit a minimum r equal to 0.942 and maximum 0.953, while the minimum and maximum MAE vary among 6.085 and 6.669. We show that the forecast of dengue cases is a challenging problem and climate variables do not always help. However, when we include the mentioned climate variables, the most important one is the humidity.
© The Author(s) 2024
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.