Summary: | Breast cancer (BC) is a leading global health challenge, with survival rate varying significantly across regions due to socio-economic disparities and healthcare accessibility. This research seeks to identify the most efficient machine learning (ML) classifier for precise BC classification using gene expression data. Utilizing the CuMiDa database’s microarray BC dataset, which includes 35,983 gene biomarkers from 146 breast adenocarcinoma patients and 143 normal subjects, the study employed R-programming for data pre-processing and feature selection. The Boruta algorithm pinpointed 214 key biomarkers, and the dataset was subsequently balanced using the SMOTE technique. Among the seven ML classifiers assessed, the support vector machine (SVM) showcased superior performance metrics such as sensitivity, specificity, and accuracy, while naïve Bayes (NB) underperformed. A thorough examination of the BC dataset revealed that SVM is the premier ML classifier, highlighting its potential for enhancing BC predictive modelling. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.
|