Comparisons of imputation methods on different types of survey research data: A continuous variable

Missing data problems are commonly unavoidable and affect the outcome of many studies. The insufficiency of data resulted in inaccurate results and predictions in many statistical analyses. In survey studies, datasets with missing values require some imputation method to continue with reliable stati...

Full description

Bibliographic Details
Published in:	AIP Conference Proceedings
Main Author:	Rahman H.A.A.; Hidayat T.; Rahman A.A.; Razif A.M.
Format:	Conference paper
Language:	English
Published:	American Institute of Physics 2024
Online Access:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85203152282&doi=10.1063%2f5.0225435&partnerID=40&md5=c3a339b3e483de40ed29a1a0aa79b6f9

id	2-s2.0-85203152282
spelling	2-s2.0-85203152282 Rahman H.A.A.; Hidayat T.; Rahman A.A.; Razif A.M. Comparisons of imputation methods on different types of survey research data: A continuous variable 2024 AIP Conference Proceedings 3123 1 10.1063/5.0225435 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85203152282&doi=10.1063%2f5.0225435&partnerID=40&md5=c3a339b3e483de40ed29a1a0aa79b6f9 Missing data problems are commonly unavoidable and affect the outcome of many studies. The insufficiency of data resulted in inaccurate results and predictions in many statistical analyses. In survey studies, datasets with missing values require some imputation method to continue with reliable statistical analyses. However, the many imputation methods available are confusing. Thus, this study aims to compile the characteristics of missing values in survey data, mapping it to suggested imputation methods. In addition, the performances of five missing data imputation methods which are mean imputation, median imputation, deterministic regression imputation, stochastic regression imputation, and predictive mean matching (PMM), were compared. Two survey datasets were used in this study, and the performance of the five compared methods was evaluated using root-mean-square error (RMSE). Results indicated that for deterministic regression imputation performed the best (RMSE = 0.3674863) and the predictive mean matching imputation (RMSE = 0.3780853) performed the least for survey data "Malaysian Perception on Rising Cost of Living". However, for the second survey dataset "A Retrospective International Study on Factors Associated with Injury, Discomfort, and Pain Perception among Cyclists"resulted in the versa, the predictive mean matching imputation (RMSE = 0.4223341) performed the best, and deterministic regression imputation performed the least (RMSE = 0.3780853). In conclusion, the selection of imputation methods should be based on the type of variable and the unique features of the datasets. © 2024 Author(s). American Institute of Physics 0094243X English Conference paper
author	Rahman H.A.A.; Hidayat T.; Rahman A.A.; Razif A.M.
spellingShingle	Rahman H.A.A.; Hidayat T.; Rahman A.A.; Razif A.M. Comparisons of imputation methods on different types of survey research data: A continuous variable
author_facet	Rahman H.A.A.; Hidayat T.; Rahman A.A.; Razif A.M.
author_sort	Rahman H.A.A.; Hidayat T.; Rahman A.A.; Razif A.M.
title	Comparisons of imputation methods on different types of survey research data: A continuous variable
title_short	Comparisons of imputation methods on different types of survey research data: A continuous variable
title_full	Comparisons of imputation methods on different types of survey research data: A continuous variable
title_fullStr	Comparisons of imputation methods on different types of survey research data: A continuous variable
title_full_unstemmed	Comparisons of imputation methods on different types of survey research data: A continuous variable
title_sort	Comparisons of imputation methods on different types of survey research data: A continuous variable
publishDate	2024
container_title	AIP Conference Proceedings
container_volume	3123
container_issue	1
doi_str_mv	10.1063/5.0225435
url	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85203152282&doi=10.1063%2f5.0225435&partnerID=40&md5=c3a339b3e483de40ed29a1a0aa79b6f9
description	Missing data problems are commonly unavoidable and affect the outcome of many studies. The insufficiency of data resulted in inaccurate results and predictions in many statistical analyses. In survey studies, datasets with missing values require some imputation method to continue with reliable statistical analyses. However, the many imputation methods available are confusing. Thus, this study aims to compile the characteristics of missing values in survey data, mapping it to suggested imputation methods. In addition, the performances of five missing data imputation methods which are mean imputation, median imputation, deterministic regression imputation, stochastic regression imputation, and predictive mean matching (PMM), were compared. Two survey datasets were used in this study, and the performance of the five compared methods was evaluated using root-mean-square error (RMSE). Results indicated that for deterministic regression imputation performed the best (RMSE = 0.3674863) and the predictive mean matching imputation (RMSE = 0.3780853) performed the least for survey data "Malaysian Perception on Rising Cost of Living". However, for the second survey dataset "A Retrospective International Study on Factors Associated with Injury, Discomfort, and Pain Perception among Cyclists"resulted in the versa, the predictive mean matching imputation (RMSE = 0.4223341) performed the best, and deterministic regression imputation performed the least (RMSE = 0.3780853). In conclusion, the selection of imputation methods should be based on the type of variable and the unique features of the datasets. © 2024 Author(s).
publisher	American Institute of Physics
issn	0094243X
language	English
format	Conference paper
accesstype
record_format	scopus
collection	Scopus
_version_	1812871793558421504

Comparisons of imputation methods on different types of survey research data: A continuous variable

Similar Items