Multi-class Classification for Breast Cancer with High Dimensional Microarray Data Using Machine Learning Classifier
Breast cancer is one of the leading causes of cancer related deaths among women. Early detection of breast cancer is very important for proper treatment and decreasing the death risk among women. Most cancer prediction study focused on binary classification of breast cancer. This study focused on mu...
Published in: | Lecture Notes on Data Engineering and Communications Technologies |
---|---|
Main Author: | |
Format: | Book chapter |
Language: | English |
Published: |
Springer Science and Business Media Deutschland GmbH
2023
|
Online Access: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-85151922471&doi=10.1007%2f978-981-99-0741-0_24&partnerID=40&md5=322a0812d31d340699ad65802dd9ff96 |
id |
2-s2.0-85151922471 |
---|---|
spelling |
2-s2.0-85151922471 Abdullah M.N.; Yap B.W.; Sapri N.N.F.F.; Wan Yaacob W.F. Multi-class Classification for Breast Cancer with High Dimensional Microarray Data Using Machine Learning Classifier 2023 Lecture Notes on Data Engineering and Communications Technologies 165 10.1007/978-981-99-0741-0_24 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85151922471&doi=10.1007%2f978-981-99-0741-0_24&partnerID=40&md5=322a0812d31d340699ad65802dd9ff96 Breast cancer is one of the leading causes of cancer related deaths among women. Early detection of breast cancer is very important for proper treatment and decreasing the death risk among women. Most cancer prediction study focused on binary classification of breast cancer. This study focused on multi-class classification of breast cancer with high dimensional microarray data. The dataset involved 38 cancer patients, 3 categories: normal (9), early tumour (12), and late tumor (17), and 39,426 microarray biomarkers. Boruta’s feature selection algorithm selected 28 important microarray biomarkers. The performance of support vector machine, multinomial logistic regression, Naïve Bayes, and random forest were evaluated based on macro and micro accuracy, sensitivity, and precision. Results showed that multinomial logistic regression, Naïve Bayes and random forest exhibits overfitting issue. However, support vector machine performed well in multi-classification of breast cancer (macro_acctest = 86.7%, macro_sentest = 77.8%, and macro_prectest = 62.0%). In future work, bagging, and boosting with over sampling techniques can be considered to improve multi-class classification of breast cancer using high dimensional microarray data. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. Springer Science and Business Media Deutschland GmbH 23674512 English Book chapter |
author |
Abdullah M.N.; Yap B.W.; Sapri N.N.F.F.; Wan Yaacob W.F. |
spellingShingle |
Abdullah M.N.; Yap B.W.; Sapri N.N.F.F.; Wan Yaacob W.F. Multi-class Classification for Breast Cancer with High Dimensional Microarray Data Using Machine Learning Classifier |
author_facet |
Abdullah M.N.; Yap B.W.; Sapri N.N.F.F.; Wan Yaacob W.F. |
author_sort |
Abdullah M.N.; Yap B.W.; Sapri N.N.F.F.; Wan Yaacob W.F. |
title |
Multi-class Classification for Breast Cancer with High Dimensional Microarray Data Using Machine Learning Classifier |
title_short |
Multi-class Classification for Breast Cancer with High Dimensional Microarray Data Using Machine Learning Classifier |
title_full |
Multi-class Classification for Breast Cancer with High Dimensional Microarray Data Using Machine Learning Classifier |
title_fullStr |
Multi-class Classification for Breast Cancer with High Dimensional Microarray Data Using Machine Learning Classifier |
title_full_unstemmed |
Multi-class Classification for Breast Cancer with High Dimensional Microarray Data Using Machine Learning Classifier |
title_sort |
Multi-class Classification for Breast Cancer with High Dimensional Microarray Data Using Machine Learning Classifier |
publishDate |
2023 |
container_title |
Lecture Notes on Data Engineering and Communications Technologies |
container_volume |
165 |
container_issue |
|
doi_str_mv |
10.1007/978-981-99-0741-0_24 |
url |
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85151922471&doi=10.1007%2f978-981-99-0741-0_24&partnerID=40&md5=322a0812d31d340699ad65802dd9ff96 |
description |
Breast cancer is one of the leading causes of cancer related deaths among women. Early detection of breast cancer is very important for proper treatment and decreasing the death risk among women. Most cancer prediction study focused on binary classification of breast cancer. This study focused on multi-class classification of breast cancer with high dimensional microarray data. The dataset involved 38 cancer patients, 3 categories: normal (9), early tumour (12), and late tumor (17), and 39,426 microarray biomarkers. Boruta’s feature selection algorithm selected 28 important microarray biomarkers. The performance of support vector machine, multinomial logistic regression, Naïve Bayes, and random forest were evaluated based on macro and micro accuracy, sensitivity, and precision. Results showed that multinomial logistic regression, Naïve Bayes and random forest exhibits overfitting issue. However, support vector machine performed well in multi-classification of breast cancer (macro_acctest = 86.7%, macro_sentest = 77.8%, and macro_prectest = 62.0%). In future work, bagging, and boosting with over sampling techniques can be considered to improve multi-class classification of breast cancer using high dimensional microarray data. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. |
publisher |
Springer Science and Business Media Deutschland GmbH |
issn |
23674512 |
language |
English |
format |
Book chapter |
accesstype |
|
record_format |
scopus |
collection |
Scopus |
_version_ |
1809677591620091904 |