Hyperparameter tuning and pipeline optimization via grid search method and tree-based autoML in breast cancer prediction

Automated machine learning (AutoML) has been recognized as a powerful tool to build a system that automates the design and optimizes the model selection machine learning (ML) pipelines. In this study, we present a tree-based pipeline optimization tool (TPOT) as a method for determining ML models wit...

Full description

Bibliographic Details
Published in:Journal of Personalized Medicine
Main Author: Radzi S.F.M.; Karim M.K.A.; Saripan M.I.; Rahman M.A.A.; Isa I.N.C.; Ibahim M.J.
Format: Article
Language:English
Published: MDPI 2021
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85116487206&doi=10.3390%2fjpm11100978&partnerID=40&md5=2128a1383f2029da26bb03cc12c198ab
id 2-s2.0-85116487206
spelling 2-s2.0-85116487206
Radzi S.F.M.; Karim M.K.A.; Saripan M.I.; Rahman M.A.A.; Isa I.N.C.; Ibahim M.J.
Hyperparameter tuning and pipeline optimization via grid search method and tree-based autoML in breast cancer prediction
2021
Journal of Personalized Medicine
11
10
10.3390/jpm11100978
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85116487206&doi=10.3390%2fjpm11100978&partnerID=40&md5=2128a1383f2029da26bb03cc12c198ab
Automated machine learning (AutoML) has been recognized as a powerful tool to build a system that automates the design and optimizes the model selection machine learning (ML) pipelines. In this study, we present a tree-based pipeline optimization tool (TPOT) as a method for determining ML models with significant performance and less complex breast cancer diagnostic pipelines. Some features of pre-processors and ML models are defined as expression trees and optimal gene programming (GP) pipelines, a stochastic search system. Features of radiomics have been presented as a guide for the ML pipeline selection from the breast cancer data set based on TPOT. Breast cancer data were used in a comparative analysis of the TPOT-generated ML pipelines with the selected ML classifiers, optimized by a grid search approach. The principal component analysis (PCA) random forest (RF) classification was proven to be the most reliable pipeline with the lowest complexity. The TPOT model selection technique exceeded the performance of grid search (GS) optimization. The RF classifier showed an outstanding outcome amongst the models in combination with only two pre-processors, with a precision of 0.83. The grid search optimized for support vector machine (SVM) classifiers generated a difference of 12% in comparison, while the other two classifiers, naïve Bayes (NB) and artificial neural network—multilayer perceptron (ANN-MLP), generated a difference of almost 39%. The method’s performance was based on sensitivity, specificity, accuracy, precision, and receiver operating curve (ROC) analysis. © 2021 by the authors. Licensee MDPI, Basel, Switzerland.
MDPI
20754426
English
Article
All Open Access; Gold Open Access
author Radzi S.F.M.; Karim M.K.A.; Saripan M.I.; Rahman M.A.A.; Isa I.N.C.; Ibahim M.J.
spellingShingle Radzi S.F.M.; Karim M.K.A.; Saripan M.I.; Rahman M.A.A.; Isa I.N.C.; Ibahim M.J.
Hyperparameter tuning and pipeline optimization via grid search method and tree-based autoML in breast cancer prediction
author_facet Radzi S.F.M.; Karim M.K.A.; Saripan M.I.; Rahman M.A.A.; Isa I.N.C.; Ibahim M.J.
author_sort Radzi S.F.M.; Karim M.K.A.; Saripan M.I.; Rahman M.A.A.; Isa I.N.C.; Ibahim M.J.
title Hyperparameter tuning and pipeline optimization via grid search method and tree-based autoML in breast cancer prediction
title_short Hyperparameter tuning and pipeline optimization via grid search method and tree-based autoML in breast cancer prediction
title_full Hyperparameter tuning and pipeline optimization via grid search method and tree-based autoML in breast cancer prediction
title_fullStr Hyperparameter tuning and pipeline optimization via grid search method and tree-based autoML in breast cancer prediction
title_full_unstemmed Hyperparameter tuning and pipeline optimization via grid search method and tree-based autoML in breast cancer prediction
title_sort Hyperparameter tuning and pipeline optimization via grid search method and tree-based autoML in breast cancer prediction
publishDate 2021
container_title Journal of Personalized Medicine
container_volume 11
container_issue 10
doi_str_mv 10.3390/jpm11100978
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85116487206&doi=10.3390%2fjpm11100978&partnerID=40&md5=2128a1383f2029da26bb03cc12c198ab
description Automated machine learning (AutoML) has been recognized as a powerful tool to build a system that automates the design and optimizes the model selection machine learning (ML) pipelines. In this study, we present a tree-based pipeline optimization tool (TPOT) as a method for determining ML models with significant performance and less complex breast cancer diagnostic pipelines. Some features of pre-processors and ML models are defined as expression trees and optimal gene programming (GP) pipelines, a stochastic search system. Features of radiomics have been presented as a guide for the ML pipeline selection from the breast cancer data set based on TPOT. Breast cancer data were used in a comparative analysis of the TPOT-generated ML pipelines with the selected ML classifiers, optimized by a grid search approach. The principal component analysis (PCA) random forest (RF) classification was proven to be the most reliable pipeline with the lowest complexity. The TPOT model selection technique exceeded the performance of grid search (GS) optimization. The RF classifier showed an outstanding outcome amongst the models in combination with only two pre-processors, with a precision of 0.83. The grid search optimized for support vector machine (SVM) classifiers generated a difference of 12% in comparison, while the other two classifiers, naïve Bayes (NB) and artificial neural network—multilayer perceptron (ANN-MLP), generated a difference of almost 39%. The method’s performance was based on sensitivity, specificity, accuracy, precision, and receiver operating curve (ROC) analysis. © 2021 by the authors. Licensee MDPI, Basel, Switzerland.
publisher MDPI
issn 20754426
language English
format Article
accesstype All Open Access; Gold Open Access
record_format scopus
collection Scopus
_version_ 1814778505101049856