Summary: | Cardiovascular heart disease (CVD) stands as the primary global cause of death, with its prevalence increasing notably with age. This study utilised a data mining model to identify crucial risk variables for CVD, selecting the most effective model among decision trees, logistic regression, and artificial neural networks based on performance. Risk factors encompassed variables such as sex, age, education, smoking habits, medical history, and physiological measures. Following dataset analysis and cleaning, model validation involved assessing confusion matrices and ROC curves. SAS Enterprise Miner determined the variable importance, revealing that age significantly impacts CVD risk in decision tree datasets. In logistic regression, age emerged as the most crucial variable with the lowest p-value (p = 0.0036). The artificial neural network (ANN) highlighted seven variables with high R-squared values, indicating their contribution to CVD risk. The results, indicate that ANN achieved the highest evaluation in terms of sensitivity and accuracy, while Decision Tree has the highest value in specificity. In conclusion, the comparative analysis underscores the ANN as the optimal model for identifying CVD risk factors in the given dataset. © 2024 IEEE.
|