https://doi.org/10.1140/epjs/s11734-024-01291-3
Regular Article
Explainable AI and tree-based ensemble models: a comparative study in predicting chemical pulmonary toxicity
1
Department of Computer and Information Sciences, Northumbria University Newcastle, Newcastle Upon Tyne, UK
2
Division of Mathematics, SAS, VIT Chennai, Chennai, India
3
School of Engineering, Newcastle University, Newcastle Upon Tyne, UK
a
keerthana.jaganathan@northumbria.ac.uk
Received:
2
June
2024
Accepted:
2
August
2024
Published online:
30
August
2024
Chemical-induced pulmonary toxicity, characterized by adverse respiratory effects from various drugs or chemicals, is increasingly becoming a point of concern for the pharmaceutical and chemical sectors, as well as public health. Traditional toxicity prediction methods are not only expensive but also demand significant time and effort. In response to these challenges, we focus on computational models to identify potential pulmonary toxicants early in the drug development process. Early identification of toxicity not only enhances the safety and efficiency of drugs and chemicals but also helps prevent late-stage drug withdrawals. In this study, we compared various sets of molecular descriptors and fingerprints using Mordred and RDKit software. We systematically employed feature selection techniques to identify the key molecular and structural features that significantly affect the model’s performance. We then applied a variety of tree-based ensemble machine-learning algorithms to build the proposed model, using a tenfold cross-validation methodology to enhance the model’s ability to predict pulmonary toxicity. We subsequently evaluated the proposed model’s performance using both a test set and a separate external validation set to assess reliability. The proposed optimal tree-ensemble model achieved an accuracy of 85.07% during tenfold cross-validation and 86.88% on the test set. Additionally, we applied the SHapley Additive exPlanations (SHAP) approach to gain deeper insights into the crucial molecular features influencing pulmonary toxicity predictions. Thus, the proposed model emerged as a promising tool for the early screening of potential pulmonary toxic compounds, enhancing chemical safety and providing interpretability for the predictions.
© The Author(s) 2024
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.