Summary: | Breast cancer is one of the leading causes of cancer related deaths among women. Early detection of breast cancer is very important for proper treatment and decreasing the death risk among women. Most cancer prediction study focused on binary classification of breast cancer. This study focused on multi-class classification of breast cancer with high dimensional microarray data. The dataset involved 38 cancer patients, 3 categories: normal (9), early tumour (12), and late tumor (17), and 39,426 microarray biomarkers. Boruta’s feature selection algorithm selected 28 important microarray biomarkers. The performance of support vector machine, multinomial logistic regression, Naïve Bayes, and random forest were evaluated based on macro and micro accuracy, sensitivity, and precision. Results showed that multinomial logistic regression, Naïve Bayes and random forest exhibits overfitting issue. However, support vector machine performed well in multi-classification of breast cancer (macro_acctest = 86.7%, macro_sentest = 77.8%, and macro_prectest = 62.0%). In future work, bagging, and boosting with over sampling techniques can be considered to improve multi-class classification of breast cancer using high dimensional microarray data. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
|