Hybrid embedded and filter feature selection methods in big-dimension mammary cancer and prostatic cancer data

The feature selection method enhances machine learning performance by enhancing learning precision. Determining the optimal feature selection method for a given machine learning task involving big-dimension data is crucial. Therefore, the purpose of this study is to make a comparison of feature sele...

Full description

Bibliographic Details
Published in:IAES International Journal of Artificial Intelligence
Main Author: Md Noh S.S.; Ibrahim N.; Mansor M.M.; Md Ghani N.A.; Yusoff M.
Format: Article
Language:English
Published: Institute of Advanced Engineering and Science 2024
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85200038464&doi=10.11591%2fijai.v13.i3.pp3101-3110&partnerID=40&md5=ba3f5442cc256848925d306572475933
id 2-s2.0-85200038464
spelling 2-s2.0-85200038464
Md Noh S.S.; Ibrahim N.; Mansor M.M.; Md Ghani N.A.; Yusoff M.
Hybrid embedded and filter feature selection methods in big-dimension mammary cancer and prostatic cancer data
2024
IAES International Journal of Artificial Intelligence
13
3
10.11591/ijai.v13.i3.pp3101-3110
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85200038464&doi=10.11591%2fijai.v13.i3.pp3101-3110&partnerID=40&md5=ba3f5442cc256848925d306572475933
The feature selection method enhances machine learning performance by enhancing learning precision. Determining the optimal feature selection method for a given machine learning task involving big-dimension data is crucial. Therefore, the purpose of this study is to make a comparison of feature selection methods highlighting several filters (information gain, chi-square, ReliefF) and embedded (Lasso, Ridge) hybrid with logistic regression (LR). A sample size of n=100, 75 is chosen randomly, and the reduction features d=50, 22, and 10 are applied. The procedure for feature reduction makes use of the entire sample sizes. Each sample size's results are compared, including tests with no feature selection process. The results indicate that LR+ReliefF is the best method for mammary cancer data, whereas LR+IG is the best for prostatic cancer data, making the filter more suitable than embedded for big-dimension data. This study revealed that the sample's features and size influence the most effective method for selecting features from big-dimension data. Therefore, it provides insight into the most effective methods for particular features and sample sizes in high-dimensional data. © 2024, Institute of Advanced Engineering and Science. All rights reserved.
Institute of Advanced Engineering and Science
20894872
English
Article
All Open Access; Gold Open Access
author Md Noh S.S.; Ibrahim N.; Mansor M.M.; Md Ghani N.A.; Yusoff M.
spellingShingle Md Noh S.S.; Ibrahim N.; Mansor M.M.; Md Ghani N.A.; Yusoff M.
Hybrid embedded and filter feature selection methods in big-dimension mammary cancer and prostatic cancer data
author_facet Md Noh S.S.; Ibrahim N.; Mansor M.M.; Md Ghani N.A.; Yusoff M.
author_sort Md Noh S.S.; Ibrahim N.; Mansor M.M.; Md Ghani N.A.; Yusoff M.
title Hybrid embedded and filter feature selection methods in big-dimension mammary cancer and prostatic cancer data
title_short Hybrid embedded and filter feature selection methods in big-dimension mammary cancer and prostatic cancer data
title_full Hybrid embedded and filter feature selection methods in big-dimension mammary cancer and prostatic cancer data
title_fullStr Hybrid embedded and filter feature selection methods in big-dimension mammary cancer and prostatic cancer data
title_full_unstemmed Hybrid embedded and filter feature selection methods in big-dimension mammary cancer and prostatic cancer data
title_sort Hybrid embedded and filter feature selection methods in big-dimension mammary cancer and prostatic cancer data
publishDate 2024
container_title IAES International Journal of Artificial Intelligence
container_volume 13
container_issue 3
doi_str_mv 10.11591/ijai.v13.i3.pp3101-3110
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85200038464&doi=10.11591%2fijai.v13.i3.pp3101-3110&partnerID=40&md5=ba3f5442cc256848925d306572475933
description The feature selection method enhances machine learning performance by enhancing learning precision. Determining the optimal feature selection method for a given machine learning task involving big-dimension data is crucial. Therefore, the purpose of this study is to make a comparison of feature selection methods highlighting several filters (information gain, chi-square, ReliefF) and embedded (Lasso, Ridge) hybrid with logistic regression (LR). A sample size of n=100, 75 is chosen randomly, and the reduction features d=50, 22, and 10 are applied. The procedure for feature reduction makes use of the entire sample sizes. Each sample size's results are compared, including tests with no feature selection process. The results indicate that LR+ReliefF is the best method for mammary cancer data, whereas LR+IG is the best for prostatic cancer data, making the filter more suitable than embedded for big-dimension data. This study revealed that the sample's features and size influence the most effective method for selecting features from big-dimension data. Therefore, it provides insight into the most effective methods for particular features and sample sizes in high-dimensional data. © 2024, Institute of Advanced Engineering and Science. All rights reserved.
publisher Institute of Advanced Engineering and Science
issn 20894872
language English
format Article
accesstype All Open Access; Gold Open Access
record_format scopus
collection Scopus
_version_ 1820775433155641344