Summary: | Diabetes mellitus is a chronic, long-term condition that significantly impacts public health and socioeconomic growth worldwide. It has been established that risk prediction models can benefit clinical management decisions by targeting patients at a higher risk of developing type 2 diabetes. This metric is critical for those at a higher risk of developing type 2 diabetes mellitus regarding healthcare and lifestyle changes. As a result, this study aims to uncover significant risk factors for type 2 diabetes mellitus classification. Another goal of this research is to discover the best prediction models by assessing the accuracy rate of each model. This study employs several classification models, includinglogistics regression, decision tree and naïve bayes. Moreover, several feature selection methods are employed in the classification model: forward selection, backward elimination and optimized selection (evolutionary). The analysis was conducted by using Diabetes BRFSS2015 dataset, which is obtained from Kaggle website. This dataset consists of 76902 observations with 20 explanatory variables and one target variable with dichotomous classification. The study's findings show that only eight of the 20 risk factors in the prediction models are identified as significant. Age, GenHlth, Sex, HvyAlcoholConsump, HighBP, HighChol, NoDocbcNoCost, and Veggies are all important riskfactors. Furthermore, among the nine prediction models, logistic regression with optimal selection had the highest accuracy rate of 75.61%. As a result, logistic regression with the optimum selection approach is the best model for predicting the prevalence of type 2 diabetes type. The study hopes to promote awareness and provide more insight into the risk factors for type 2 diabetes. Type 2 diabetes could be correctly predicted and recognised early, resulting in prompt, effective treatments and reduced consequences. © 2023 IEEE.
|