2Department of Computer Engineering, Süleyman Demirel University, Isparta, Türkiye
Abstract
Background: The aim of this study was to evaluate the relationship between risk factors causing cardiovascular diseases and their importance with explainable machine learning models.
Methods: In this retrospective study, multiple databases were searched, and data on 11 risk factors of 70 000 patients were obtained. Data included risk factors highly associated with cardiovascular disease and having/not having any cardiovascular disease. The explainable prediction model was constructed using 7 machine learning algorithms: Random Forest Classifier, Extreme Gradient Boost Classifier, Decision Tree Classifier, KNeighbors Classifier, Support Vector Machine Classifier, and GaussianNB. Receiver operating characteristic curve, Brier scores, and mean accuracy were used to assess the model’s performance. The interpretability of the predicted results was examined using Shapley additive description values.
Results: The accuracy, area under the curve values, and Brier scores of the Extreme Gradient Boost model (the best prediction model for cardiovascular disease risk factors) were calculated as 0.739, 0.803, and 0.260, respectively. The most important risk factors in the permutation feature importance method and explainable artificial intelligence–Shapley’s explanations method are systolic blood pressure (ap_hi) [0.1335 ± 0.0045 w (weight)], cholesterol (0.0341 ± 0.0022 w), and age (0.0211 ± 0.0036 w).
Conclusion: The created explainable machine learning model has become a successful clinical model that can predict cardiovascular patients and explain the impact of risk factors. Especially in the clinical setting, this model, which has an accurate, explainable, and transparent algorithm, will help encourage early diagnosis of patients with cardiovascular diseases, risk factors, and possible treatment options.