Evaluation of Machine Learning Models for Breast Cancer Detection in Microarray Gene Expression Profiles

Breast cancer (BC) is a leading global health challenge, with survival rate varying significantly across regions due to socio-economic disparities and healthcare accessibility. This research seeks to identify the most efficient machine learning (ML) classifier for precise BC classification using gen...

Full description

Bibliographic Details
Published in:Lecture Notes on Data Engineering and Communications Technologies
Main Author: Abdullah M.N.; Wah Y.B.
Format: Book chapter
Language:English
Published: Springer Science and Business Media Deutschland GmbH 2024
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85192727185&doi=10.1007%2f978-981-97-0293-0_40&partnerID=40&md5=db178ceadd1dbdca58b5a249ae07281c
id 2-s2.0-85192727185
spelling 2-s2.0-85192727185
Abdullah M.N.; Wah Y.B.
Evaluation of Machine Learning Models for Breast Cancer Detection in Microarray Gene Expression Profiles
2024
Lecture Notes on Data Engineering and Communications Technologies
191

10.1007/978-981-97-0293-0_40
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85192727185&doi=10.1007%2f978-981-97-0293-0_40&partnerID=40&md5=db178ceadd1dbdca58b5a249ae07281c
Breast cancer (BC) is a leading global health challenge, with survival rate varying significantly across regions due to socio-economic disparities and healthcare accessibility. This research seeks to identify the most efficient machine learning (ML) classifier for precise BC classification using gene expression data. Utilizing the CuMiDa database’s microarray BC dataset, which includes 35,983 gene biomarkers from 146 breast adenocarcinoma patients and 143 normal subjects, the study employed R-programming for data pre-processing and feature selection. The Boruta algorithm pinpointed 214 key biomarkers, and the dataset was subsequently balanced using the SMOTE technique. Among the seven ML classifiers assessed, the support vector machine (SVM) showcased superior performance metrics such as sensitivity, specificity, and accuracy, while naïve Bayes (NB) underperformed. A thorough examination of the BC dataset revealed that SVM is the premier ML classifier, highlighting its potential for enhancing BC predictive modelling. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.
Springer Science and Business Media Deutschland GmbH
23674512
English
Book chapter

author Abdullah M.N.; Wah Y.B.
spellingShingle Abdullah M.N.; Wah Y.B.
Evaluation of Machine Learning Models for Breast Cancer Detection in Microarray Gene Expression Profiles
author_facet Abdullah M.N.; Wah Y.B.
author_sort Abdullah M.N.; Wah Y.B.
title Evaluation of Machine Learning Models for Breast Cancer Detection in Microarray Gene Expression Profiles
title_short Evaluation of Machine Learning Models for Breast Cancer Detection in Microarray Gene Expression Profiles
title_full Evaluation of Machine Learning Models for Breast Cancer Detection in Microarray Gene Expression Profiles
title_fullStr Evaluation of Machine Learning Models for Breast Cancer Detection in Microarray Gene Expression Profiles
title_full_unstemmed Evaluation of Machine Learning Models for Breast Cancer Detection in Microarray Gene Expression Profiles
title_sort Evaluation of Machine Learning Models for Breast Cancer Detection in Microarray Gene Expression Profiles
publishDate 2024
container_title Lecture Notes on Data Engineering and Communications Technologies
container_volume 191
container_issue
doi_str_mv 10.1007/978-981-97-0293-0_40
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85192727185&doi=10.1007%2f978-981-97-0293-0_40&partnerID=40&md5=db178ceadd1dbdca58b5a249ae07281c
description Breast cancer (BC) is a leading global health challenge, with survival rate varying significantly across regions due to socio-economic disparities and healthcare accessibility. This research seeks to identify the most efficient machine learning (ML) classifier for precise BC classification using gene expression data. Utilizing the CuMiDa database’s microarray BC dataset, which includes 35,983 gene biomarkers from 146 breast adenocarcinoma patients and 143 normal subjects, the study employed R-programming for data pre-processing and feature selection. The Boruta algorithm pinpointed 214 key biomarkers, and the dataset was subsequently balanced using the SMOTE technique. Among the seven ML classifiers assessed, the support vector machine (SVM) showcased superior performance metrics such as sensitivity, specificity, and accuracy, while naïve Bayes (NB) underperformed. A thorough examination of the BC dataset revealed that SVM is the premier ML classifier, highlighting its potential for enhancing BC predictive modelling. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.
publisher Springer Science and Business Media Deutschland GmbH
issn 23674512
language English
format Book chapter
accesstype
record_format scopus
collection Scopus
_version_ 1809677884755804160