Annual workers' income prediction using data mining techniques

Predicting annual workers income requires ones to deep dive into several factors. Factors that majorly being discuss were age, gender, education and occupation. On the other hand, there are other factors that may affect the annual workers income where it yet to be discussed. The traditional way of p...

Full description

Bibliographic Details
Published in:AIP Conference Proceedings
Main Author: Yahaya M.S.; Hasbullah M.H.; Jamil S.A.M.; Ul-Saufie A.Z.; Ibrahim N.
Format: Conference paper
Language:English
Published: American Institute of Physics 2024
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85203165759&doi=10.1063%2f5.0223795&partnerID=40&md5=93155263c25603438494517f24d1fa59
id 2-s2.0-85203165759
spelling 2-s2.0-85203165759
Yahaya M.S.; Hasbullah M.H.; Jamil S.A.M.; Ul-Saufie A.Z.; Ibrahim N.
Annual workers' income prediction using data mining techniques
2024
AIP Conference Proceedings
3123
1
10.1063/5.0223795
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85203165759&doi=10.1063%2f5.0223795&partnerID=40&md5=93155263c25603438494517f24d1fa59
Predicting annual workers income requires ones to deep dive into several factors. Factors that majorly being discuss were age, gender, education and occupation. On the other hand, there are other factors that may affect the annual workers income where it yet to be discussed. The traditional way of predicting the annual workers income was multiple linear regression. This parametric approach requires assumptions to be fulfilled and this will actions is a time-consuming activity. Data mining approach in predicting the workers income is important to understand on how the economy and compensation work in the United States. Machine learning will cover all aspect without needing to fulfil certain assumptions as compared to traditional method. Hence, the best way to predict the worker's income in the United States is the best using machine learning and concurrently solve the SDG 8: Decent Work & Economic Growth aspect. The dataset used in this study is acquired from Kaggle website. At first, features weight using filter method (Weight by Information Gain, Weight by Information gain Ratio and Weight by Chi - Squared Statistics) were taken to identify the influential factors towards annual workers' income. The three different methods employed in the model to predict worker income are logistic regression, decision trees, and artificial neural networks. The second goal is to contrast the effectiveness of worker income prediction using under sampling and oversampling techniques. The results show that, with the exception of decision tree, oversampling strategy provides the best performance of prediction model when compared to under sampling technique. Since under sampling techniques randomly delete observations when there is a chance that such observations could be significant to the data and have an impact on the prediction model, oversampling techniques perform better than under sampling techniques. The third goal is to identify the most effective classification model for predicting worker's income. The oversampling strategy with backward selection represents the best model when applying the Logistic Regression model. Additionally, the optimal model for Decision Trees is the backward selection with under sampling strategy. The best model criterion for artificial neural networks is the oversampling method via backward selection. Data mining approach in predicting the workers income is important to understand on how the economy and compensation work in the United States. © 2024 Author(s).
American Institute of Physics
0094243X
English
Conference paper

author Yahaya M.S.; Hasbullah M.H.; Jamil S.A.M.; Ul-Saufie A.Z.; Ibrahim N.
spellingShingle Yahaya M.S.; Hasbullah M.H.; Jamil S.A.M.; Ul-Saufie A.Z.; Ibrahim N.
Annual workers' income prediction using data mining techniques
author_facet Yahaya M.S.; Hasbullah M.H.; Jamil S.A.M.; Ul-Saufie A.Z.; Ibrahim N.
author_sort Yahaya M.S.; Hasbullah M.H.; Jamil S.A.M.; Ul-Saufie A.Z.; Ibrahim N.
title Annual workers' income prediction using data mining techniques
title_short Annual workers' income prediction using data mining techniques
title_full Annual workers' income prediction using data mining techniques
title_fullStr Annual workers' income prediction using data mining techniques
title_full_unstemmed Annual workers' income prediction using data mining techniques
title_sort Annual workers' income prediction using data mining techniques
publishDate 2024
container_title AIP Conference Proceedings
container_volume 3123
container_issue 1
doi_str_mv 10.1063/5.0223795
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85203165759&doi=10.1063%2f5.0223795&partnerID=40&md5=93155263c25603438494517f24d1fa59
description Predicting annual workers income requires ones to deep dive into several factors. Factors that majorly being discuss were age, gender, education and occupation. On the other hand, there are other factors that may affect the annual workers income where it yet to be discussed. The traditional way of predicting the annual workers income was multiple linear regression. This parametric approach requires assumptions to be fulfilled and this will actions is a time-consuming activity. Data mining approach in predicting the workers income is important to understand on how the economy and compensation work in the United States. Machine learning will cover all aspect without needing to fulfil certain assumptions as compared to traditional method. Hence, the best way to predict the worker's income in the United States is the best using machine learning and concurrently solve the SDG 8: Decent Work & Economic Growth aspect. The dataset used in this study is acquired from Kaggle website. At first, features weight using filter method (Weight by Information Gain, Weight by Information gain Ratio and Weight by Chi - Squared Statistics) were taken to identify the influential factors towards annual workers' income. The three different methods employed in the model to predict worker income are logistic regression, decision trees, and artificial neural networks. The second goal is to contrast the effectiveness of worker income prediction using under sampling and oversampling techniques. The results show that, with the exception of decision tree, oversampling strategy provides the best performance of prediction model when compared to under sampling technique. Since under sampling techniques randomly delete observations when there is a chance that such observations could be significant to the data and have an impact on the prediction model, oversampling techniques perform better than under sampling techniques. The third goal is to identify the most effective classification model for predicting worker's income. The oversampling strategy with backward selection represents the best model when applying the Logistic Regression model. Additionally, the optimal model for Decision Trees is the backward selection with under sampling strategy. The best model criterion for artificial neural networks is the oversampling method via backward selection. Data mining approach in predicting the workers income is important to understand on how the economy and compensation work in the United States. © 2024 Author(s).
publisher American Institute of Physics
issn 0094243X
language English
format Conference paper
accesstype
record_format scopus
collection Scopus
_version_ 1812871793521721344