Summary: | The principal component analysis (PCA) is widely used in computational science branches such as computer science, pattern recognition, and machine learning, as it can effectively reduce the dimensionality of high-dimensional data. In particular, it is a popular transformation method used for feature extraction. In this study, we explore PCA’s ability for feature selection in regression applications. We introduce a new approach using PCA, called Targeted PCA to analyze a multivariate dataset that includes the dependent variable—it identifies the principal component with a high representation of the dependent variable and then examines the selected principal component to capture and rank the contribution of the non-dependent variables. The study also compares the feature selected with that resulting from a Least Absolute Shrinkage and Selection Operator (LASSO) regression. Finally, the selected features were tested in two regression models: multiple linear regression (MLR) and artificial neural network (ANN). The results are presented for three socioeconomic, environmental, and computer image processing datasets. Our study found that 2 of 3 random datasets have more than 50% similarity in the selected features by the PCA and LASSO regression methods. In the regression predictions, our PCA-selected features resulted in little difference compared to the LASSO regression-selected features in terms of the MLR prediction accuracy. However, the ANN regression demonstrated a faster convergence and a higher reduction of error. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2023.
|