https://doi.org/10.1140/epjs/s11734-025-01508-z
Regular Article
Machine learning-based detection of alcohol intoxication through speech analysis: a comparative study of AI models
1
Tomsk State University of Control Systems and Radioelectronics, 634050, Tomsk, Russia
2
Lobachevsky State University, 603022, Nizhny Novgorod, Russia
a
valeriia.demareva@fsn.unn.ru
Received:
31
October
2024
Accepted:
6
February
2025
Published online:
27
February
2025
Alcohol intoxication significantly impairs cognitive and motor functions, posing serious risks in safety–critical environments such as transportation and public security. Reliable methods for detecting intoxication are essential for preventing accidents and ensuring public safety. This study evaluates the effectiveness of machine learning and deep learning algorithms in detecting alcohol intoxication through speech analysis. The dataset comprised 636 tongue twister recordings collected under two conditions: sober [blood alcohol concentration (BAC) = 0.0] and intoxicated (BAC = 0.15). The gradient boosting classifier achieved the highest F1 score of 0.78, outperforming logistic regression models (max F1 = 0.70) and Convolutional Neural Networks (F1 = 0.75), highlighting its superior performance for intoxication detection in speech-based applications. In a secondary analysis, we investigated the impact of intoxication on speech recognition using four models: VOSK, Google Speech-to-Text, Whisper, and Caesar-R. Among these, the Whisper model consistently outperformed the others across key metrics, demonstrating robust performance in both sober and intoxicated states. Recognition quality generally declined under intoxicated conditions for VOSK and Caesar-R, while Google and Whisper models exhibited improved performance. To align automated assessments with human evaluations of speech clarity, we used expert analysis to determine the most reliable metric. The Cosine Similarity metric demonstrated the strongest potential as a robust evaluation tool for alcohol detection systems. Overall, this study identifies Gradient Boosting and Whisper as highly effective methods for automated alcohol intoxication detection through speech. These findings present significant implications for safety monitoring and public security applications, particularly in scenarios requiring non-invasive and efficient evaluation.
Copyright comment Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
© The Author(s), under exclusive licence to EDP Sciences, Springer-Verlag GmbH Germany, part of Springer Nature 2025
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.