Multi-class Classification for Breast Cancer with High Dimensional Microarray Data Using Machine Learning Classifier

Breast cancer is one of the leading causes of cancer related deaths among women. Early detection of breast cancer is very important for proper treatment and decreasing the death risk among women. Most cancer prediction study focused on binary classification of breast cancer. This study focused on mu...

Full description

Bibliographic Details
Published in:Lecture Notes on Data Engineering and Communications Technologies
Main Author: Abdullah M.N.; Yap B.W.; Sapri N.N.F.F.; Wan Yaacob W.F.
Format: Book chapter
Language:English
Published: Springer Science and Business Media Deutschland GmbH 2023
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85151922471&doi=10.1007%2f978-981-99-0741-0_24&partnerID=40&md5=322a0812d31d340699ad65802dd9ff96
id 2-s2.0-85151922471
spelling 2-s2.0-85151922471
Abdullah M.N.; Yap B.W.; Sapri N.N.F.F.; Wan Yaacob W.F.
Multi-class Classification for Breast Cancer with High Dimensional Microarray Data Using Machine Learning Classifier
2023
Lecture Notes on Data Engineering and Communications Technologies
165

10.1007/978-981-99-0741-0_24
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85151922471&doi=10.1007%2f978-981-99-0741-0_24&partnerID=40&md5=322a0812d31d340699ad65802dd9ff96
Breast cancer is one of the leading causes of cancer related deaths among women. Early detection of breast cancer is very important for proper treatment and decreasing the death risk among women. Most cancer prediction study focused on binary classification of breast cancer. This study focused on multi-class classification of breast cancer with high dimensional microarray data. The dataset involved 38 cancer patients, 3 categories: normal (9), early tumour (12), and late tumor (17), and 39,426 microarray biomarkers. Boruta’s feature selection algorithm selected 28 important microarray biomarkers. The performance of support vector machine, multinomial logistic regression, Naïve Bayes, and random forest were evaluated based on macro and micro accuracy, sensitivity, and precision. Results showed that multinomial logistic regression, Naïve Bayes and random forest exhibits overfitting issue. However, support vector machine performed well in multi-classification of breast cancer (macro_acctest = 86.7%, macro_sentest = 77.8%, and macro_prectest = 62.0%). In future work, bagging, and boosting with over sampling techniques can be considered to improve multi-class classification of breast cancer using high dimensional microarray data. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
Springer Science and Business Media Deutschland GmbH
23674512
English
Book chapter

author Abdullah M.N.; Yap B.W.; Sapri N.N.F.F.; Wan Yaacob W.F.
spellingShingle Abdullah M.N.; Yap B.W.; Sapri N.N.F.F.; Wan Yaacob W.F.
Multi-class Classification for Breast Cancer with High Dimensional Microarray Data Using Machine Learning Classifier
author_facet Abdullah M.N.; Yap B.W.; Sapri N.N.F.F.; Wan Yaacob W.F.
author_sort Abdullah M.N.; Yap B.W.; Sapri N.N.F.F.; Wan Yaacob W.F.
title Multi-class Classification for Breast Cancer with High Dimensional Microarray Data Using Machine Learning Classifier
title_short Multi-class Classification for Breast Cancer with High Dimensional Microarray Data Using Machine Learning Classifier
title_full Multi-class Classification for Breast Cancer with High Dimensional Microarray Data Using Machine Learning Classifier
title_fullStr Multi-class Classification for Breast Cancer with High Dimensional Microarray Data Using Machine Learning Classifier
title_full_unstemmed Multi-class Classification for Breast Cancer with High Dimensional Microarray Data Using Machine Learning Classifier
title_sort Multi-class Classification for Breast Cancer with High Dimensional Microarray Data Using Machine Learning Classifier
publishDate 2023
container_title Lecture Notes on Data Engineering and Communications Technologies
container_volume 165
container_issue
doi_str_mv 10.1007/978-981-99-0741-0_24
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85151922471&doi=10.1007%2f978-981-99-0741-0_24&partnerID=40&md5=322a0812d31d340699ad65802dd9ff96
description Breast cancer is one of the leading causes of cancer related deaths among women. Early detection of breast cancer is very important for proper treatment and decreasing the death risk among women. Most cancer prediction study focused on binary classification of breast cancer. This study focused on multi-class classification of breast cancer with high dimensional microarray data. The dataset involved 38 cancer patients, 3 categories: normal (9), early tumour (12), and late tumor (17), and 39,426 microarray biomarkers. Boruta’s feature selection algorithm selected 28 important microarray biomarkers. The performance of support vector machine, multinomial logistic regression, Naïve Bayes, and random forest were evaluated based on macro and micro accuracy, sensitivity, and precision. Results showed that multinomial logistic regression, Naïve Bayes and random forest exhibits overfitting issue. However, support vector machine performed well in multi-classification of breast cancer (macro_acctest = 86.7%, macro_sentest = 77.8%, and macro_prectest = 62.0%). In future work, bagging, and boosting with over sampling techniques can be considered to improve multi-class classification of breast cancer using high dimensional microarray data. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
publisher Springer Science and Business Media Deutschland GmbH
issn 23674512
language English
format Book chapter
accesstype
record_format scopus
collection Scopus
_version_ 1809677591620091904