Evaluation of Machine Learning Models for Breast Cancer Detection in Microarray Gene Expression Profiles

Breast cancer (BC) is a leading global health challenge, with survival rate varying significantly across regions due to socio-economic disparities and healthcare accessibility. This research seeks to identify the most efficient machine learning (ML) classifier for precise BC classification using gen...

Full description

Bibliographic Details
Published in:Lecture Notes on Data Engineering and Communications Technologies
Main Author: Abdullah M.N.; Wah Y.B.
Format: Book chapter
Language:English
Published: Springer Science and Business Media Deutschland GmbH 2024
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85192727185&doi=10.1007%2f978-981-97-0293-0_40&partnerID=40&md5=db178ceadd1dbdca58b5a249ae07281c
Description
Summary:Breast cancer (BC) is a leading global health challenge, with survival rate varying significantly across regions due to socio-economic disparities and healthcare accessibility. This research seeks to identify the most efficient machine learning (ML) classifier for precise BC classification using gene expression data. Utilizing the CuMiDa database’s microarray BC dataset, which includes 35,983 gene biomarkers from 146 breast adenocarcinoma patients and 143 normal subjects, the study employed R-programming for data pre-processing and feature selection. The Boruta algorithm pinpointed 214 key biomarkers, and the dataset was subsequently balanced using the SMOTE technique. Among the seven ML classifiers assessed, the support vector machine (SVM) showcased superior performance metrics such as sensitivity, specificity, and accuracy, while naïve Bayes (NB) underperformed. A thorough examination of the BC dataset revealed that SVM is the premier ML classifier, highlighting its potential for enhancing BC predictive modelling. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.
ISSN:23674512
DOI:10.1007/978-981-97-0293-0_40