Supervised feature selection using principal component analysis

The principal component analysis (PCA) is widely used in computational science branches such as computer science, pattern recognition, and machine learning, as it can effectively reduce the dimensionality of high-dimensional data. In particular, it is a popular transformation method used for feature...

Full description

Bibliographic Details
Published in:	Knowledge and Information Systems
Main Author:	Rahmat F.; Zulkafli Z.; Ishak A.J.; Abdul Rahman R.Z.; Stercke S.D.; Buytaert W.; Tahir W.; Ab Rahman J.; Ibrahim S.; Ismail M.
Format:	Article
Language:	English
Published:	Springer Science and Business Media Deutschland GmbH 2024
Online Access:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85176106868&doi=10.1007%2fs10115-023-01993-5&partnerID=40&md5=7625ea19a71f40771238c85b88344422

id	2-s2.0-85176106868
spelling	2-s2.0-85176106868 Rahmat F.; Zulkafli Z.; Ishak A.J.; Abdul Rahman R.Z.; Stercke S.D.; Buytaert W.; Tahir W.; Ab Rahman J.; Ibrahim S.; Ismail M. Supervised feature selection using principal component analysis 2024 Knowledge and Information Systems 66 3 10.1007/s10115-023-01993-5 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85176106868&doi=10.1007%2fs10115-023-01993-5&partnerID=40&md5=7625ea19a71f40771238c85b88344422 The principal component analysis (PCA) is widely used in computational science branches such as computer science, pattern recognition, and machine learning, as it can effectively reduce the dimensionality of high-dimensional data. In particular, it is a popular transformation method used for feature extraction. In this study, we explore PCA’s ability for feature selection in regression applications. We introduce a new approach using PCA, called Targeted PCA to analyze a multivariate dataset that includes the dependent variable—it identifies the principal component with a high representation of the dependent variable and then examines the selected principal component to capture and rank the contribution of the non-dependent variables. The study also compares the feature selected with that resulting from a Least Absolute Shrinkage and Selection Operator (LASSO) regression. Finally, the selected features were tested in two regression models: multiple linear regression (MLR) and artificial neural network (ANN). The results are presented for three socioeconomic, environmental, and computer image processing datasets. Our study found that 2 of 3 random datasets have more than 50% similarity in the selected features by the PCA and LASSO regression methods. In the regression predictions, our PCA-selected features resulted in little difference compared to the LASSO regression-selected features in terms of the MLR prediction accuracy. However, the ANN regression demonstrated a faster convergence and a higher reduction of error. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2023. Springer Science and Business Media Deutschland GmbH 2191377 English Article
author	Rahmat F.; Zulkafli Z.; Ishak A.J.; Abdul Rahman R.Z.; Stercke S.D.; Buytaert W.; Tahir W.; Ab Rahman J.; Ibrahim S.; Ismail M.
spellingShingle	Rahmat F.; Zulkafli Z.; Ishak A.J.; Abdul Rahman R.Z.; Stercke S.D.; Buytaert W.; Tahir W.; Ab Rahman J.; Ibrahim S.; Ismail M. Supervised feature selection using principal component analysis
author_facet	Rahmat F.; Zulkafli Z.; Ishak A.J.; Abdul Rahman R.Z.; Stercke S.D.; Buytaert W.; Tahir W.; Ab Rahman J.; Ibrahim S.; Ismail M.
author_sort	Rahmat F.; Zulkafli Z.; Ishak A.J.; Abdul Rahman R.Z.; Stercke S.D.; Buytaert W.; Tahir W.; Ab Rahman J.; Ibrahim S.; Ismail M.
title	Supervised feature selection using principal component analysis
title_short	Supervised feature selection using principal component analysis
title_full	Supervised feature selection using principal component analysis
title_fullStr	Supervised feature selection using principal component analysis
title_full_unstemmed	Supervised feature selection using principal component analysis
title_sort	Supervised feature selection using principal component analysis
publishDate	2024
container_title	Knowledge and Information Systems
container_volume	66
container_issue	3
doi_str_mv	10.1007/s10115-023-01993-5
url	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85176106868&doi=10.1007%2fs10115-023-01993-5&partnerID=40&md5=7625ea19a71f40771238c85b88344422
description	The principal component analysis (PCA) is widely used in computational science branches such as computer science, pattern recognition, and machine learning, as it can effectively reduce the dimensionality of high-dimensional data. In particular, it is a popular transformation method used for feature extraction. In this study, we explore PCA’s ability for feature selection in regression applications. We introduce a new approach using PCA, called Targeted PCA to analyze a multivariate dataset that includes the dependent variable—it identifies the principal component with a high representation of the dependent variable and then examines the selected principal component to capture and rank the contribution of the non-dependent variables. The study also compares the feature selected with that resulting from a Least Absolute Shrinkage and Selection Operator (LASSO) regression. Finally, the selected features were tested in two regression models: multiple linear regression (MLR) and artificial neural network (ANN). The results are presented for three socioeconomic, environmental, and computer image processing datasets. Our study found that 2 of 3 random datasets have more than 50% similarity in the selected features by the PCA and LASSO regression methods. In the regression predictions, our PCA-selected features resulted in little difference compared to the LASSO regression-selected features in terms of the MLR prediction accuracy. However, the ANN regression demonstrated a faster convergence and a higher reduction of error. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2023.
publisher	Springer Science and Business Media Deutschland GmbH
issn	2191377
language	English
format	Article
accesstype
record_format	scopus
collection	Scopus
_version_	1809677883687305216

Supervised feature selection using principal component analysis

Similar Items