Evaluation of Machine Learning Models for Breast Cancer Detection in Microarray Gene Expression Profiles
Breast cancer (BC) is a leading global health challenge, with survival rate varying significantly across regions due to socio-economic disparities and healthcare accessibility. This research seeks to identify the most efficient machine learning (ML) classifier for precise BC classification using gen...
Published in: | Lecture Notes on Data Engineering and Communications Technologies |
---|---|
Main Author: | |
Format: | Book chapter |
Language: | English |
Published: |
Springer Science and Business Media Deutschland GmbH
2024
|
Online Access: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-85192727185&doi=10.1007%2f978-981-97-0293-0_40&partnerID=40&md5=db178ceadd1dbdca58b5a249ae07281c |
id |
2-s2.0-85192727185 |
---|---|
spelling |
2-s2.0-85192727185 Abdullah M.N.; Wah Y.B. Evaluation of Machine Learning Models for Breast Cancer Detection in Microarray Gene Expression Profiles 2024 Lecture Notes on Data Engineering and Communications Technologies 191 10.1007/978-981-97-0293-0_40 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85192727185&doi=10.1007%2f978-981-97-0293-0_40&partnerID=40&md5=db178ceadd1dbdca58b5a249ae07281c Breast cancer (BC) is a leading global health challenge, with survival rate varying significantly across regions due to socio-economic disparities and healthcare accessibility. This research seeks to identify the most efficient machine learning (ML) classifier for precise BC classification using gene expression data. Utilizing the CuMiDa database’s microarray BC dataset, which includes 35,983 gene biomarkers from 146 breast adenocarcinoma patients and 143 normal subjects, the study employed R-programming for data pre-processing and feature selection. The Boruta algorithm pinpointed 214 key biomarkers, and the dataset was subsequently balanced using the SMOTE technique. Among the seven ML classifiers assessed, the support vector machine (SVM) showcased superior performance metrics such as sensitivity, specificity, and accuracy, while naïve Bayes (NB) underperformed. A thorough examination of the BC dataset revealed that SVM is the premier ML classifier, highlighting its potential for enhancing BC predictive modelling. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024. Springer Science and Business Media Deutschland GmbH 23674512 English Book chapter |
author |
Abdullah M.N.; Wah Y.B. |
spellingShingle |
Abdullah M.N.; Wah Y.B. Evaluation of Machine Learning Models for Breast Cancer Detection in Microarray Gene Expression Profiles |
author_facet |
Abdullah M.N.; Wah Y.B. |
author_sort |
Abdullah M.N.; Wah Y.B. |
title |
Evaluation of Machine Learning Models for Breast Cancer Detection in Microarray Gene Expression Profiles |
title_short |
Evaluation of Machine Learning Models for Breast Cancer Detection in Microarray Gene Expression Profiles |
title_full |
Evaluation of Machine Learning Models for Breast Cancer Detection in Microarray Gene Expression Profiles |
title_fullStr |
Evaluation of Machine Learning Models for Breast Cancer Detection in Microarray Gene Expression Profiles |
title_full_unstemmed |
Evaluation of Machine Learning Models for Breast Cancer Detection in Microarray Gene Expression Profiles |
title_sort |
Evaluation of Machine Learning Models for Breast Cancer Detection in Microarray Gene Expression Profiles |
publishDate |
2024 |
container_title |
Lecture Notes on Data Engineering and Communications Technologies |
container_volume |
191 |
container_issue |
|
doi_str_mv |
10.1007/978-981-97-0293-0_40 |
url |
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85192727185&doi=10.1007%2f978-981-97-0293-0_40&partnerID=40&md5=db178ceadd1dbdca58b5a249ae07281c |
description |
Breast cancer (BC) is a leading global health challenge, with survival rate varying significantly across regions due to socio-economic disparities and healthcare accessibility. This research seeks to identify the most efficient machine learning (ML) classifier for precise BC classification using gene expression data. Utilizing the CuMiDa database’s microarray BC dataset, which includes 35,983 gene biomarkers from 146 breast adenocarcinoma patients and 143 normal subjects, the study employed R-programming for data pre-processing and feature selection. The Boruta algorithm pinpointed 214 key biomarkers, and the dataset was subsequently balanced using the SMOTE technique. Among the seven ML classifiers assessed, the support vector machine (SVM) showcased superior performance metrics such as sensitivity, specificity, and accuracy, while naïve Bayes (NB) underperformed. A thorough examination of the BC dataset revealed that SVM is the premier ML classifier, highlighting its potential for enhancing BC predictive modelling. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024. |
publisher |
Springer Science and Business Media Deutschland GmbH |
issn |
23674512 |
language |
English |
format |
Book chapter |
accesstype |
|
record_format |
scopus |
collection |
Scopus |
_version_ |
1809677884755804160 |