Summary: | AutoML offers significant benefits in solving real-life problems because it accelerates the development of machine learning models. In contexts involving real scenarios like analyzing companies’ environmental, social and governance (ESG), where the dataset presents some challenges, AutoML is anticipated as a promising solution to address these complexities. Although researchers have shown significant interest in exploring Genetic Programming (GP) in AutoML for handling complex datasets, a critical issue that remains unresolved is the comprehensive understanding of GP hyper-parameters that influence machine learning performance. While GP-based AutoML excels in automating many aspects of the modelling, there has been a scarcity of research that provides insight into the significance of individual features and GP population size within the models of GP-based AutoML. This paper presents a comprehensive analysis of the models’ performance evaluation from multiple facets, including feature selection, GP population sizes, and different machine learning algorithms. Furthermore, this study provides insights into the association between Pearson correlations, machine learning performance, and the importance of machine learning features. The findings demonstrate that incorporating all the determinants as features in GP-based AutoML or relying solely on firm characteristics led to superior performance with an excellent trade-off between True Positive Rate and False Positive Rate. Thus, higher accuracy results exceeding 0.9 of Area Under the Curve (AUC) are presented by the proposed models. The novelty of this study lies in its empirical evaluation of different approaches to GP-based AutoML implementation. These findings provide alternative solutions for business investors to identify companies with strong sustainability practices. © 2024 The Authors.
|