Preliminary study on multiple imputation for nonresponse in survey data with feature selection

Missing data prediction is ubiquitous in survey research. Multiple imputation is a common approach for handling missing observation on survey data. Feature selection is a technique to find the best features from a dataset before building a predictive model for missing data. The mice package in R pro...

Full description

Bibliographic Details
Published in:AIP Conference Proceedings
Main Author: Jasin A.M.; Asmat A.
Format: Conference paper
Language:English
Published: American Institute of Physics Inc. 2023
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85166536489&doi=10.1063%2f5.0129514&partnerID=40&md5=1507078e030d5a2a896354e9e8f507f5
id 2-s2.0-85166536489
spelling 2-s2.0-85166536489
Jasin A.M.; Asmat A.
Preliminary study on multiple imputation for nonresponse in survey data with feature selection
2023
AIP Conference Proceedings
2608

10.1063/5.0129514
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85166536489&doi=10.1063%2f5.0129514&partnerID=40&md5=1507078e030d5a2a896354e9e8f507f5
Missing data prediction is ubiquitous in survey research. Multiple imputation is a common approach for handling missing observation on survey data. Feature selection is a technique to find the best features from a dataset before building a predictive model for missing data. The mice package in R programming tool is the most popular tool to deal with complex incomplete data. In mice package, there are several imputation techniques provided in mice() function such as predictive mean matching, random sample from observed data and linear regression to impute the missing data that appropriate for numerical data type. However, in this study, the results show there are slightly small difference in the precision when using these three imputation methods. Therefore, this paper proposed to apply feature selection using Boruta algorithm to select the most important features before building a missing data prediction model using mice package. The results show that applying the feature selection process before imputing the missing data increase precision in analysis prediction model regardless of the proportion of missing data. © 2023 Author(s).
American Institute of Physics Inc.
0094243X
English
Conference paper

author Jasin A.M.; Asmat A.
spellingShingle Jasin A.M.; Asmat A.
Preliminary study on multiple imputation for nonresponse in survey data with feature selection
author_facet Jasin A.M.; Asmat A.
author_sort Jasin A.M.; Asmat A.
title Preliminary study on multiple imputation for nonresponse in survey data with feature selection
title_short Preliminary study on multiple imputation for nonresponse in survey data with feature selection
title_full Preliminary study on multiple imputation for nonresponse in survey data with feature selection
title_fullStr Preliminary study on multiple imputation for nonresponse in survey data with feature selection
title_full_unstemmed Preliminary study on multiple imputation for nonresponse in survey data with feature selection
title_sort Preliminary study on multiple imputation for nonresponse in survey data with feature selection
publishDate 2023
container_title AIP Conference Proceedings
container_volume 2608
container_issue
doi_str_mv 10.1063/5.0129514
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85166536489&doi=10.1063%2f5.0129514&partnerID=40&md5=1507078e030d5a2a896354e9e8f507f5
description Missing data prediction is ubiquitous in survey research. Multiple imputation is a common approach for handling missing observation on survey data. Feature selection is a technique to find the best features from a dataset before building a predictive model for missing data. The mice package in R programming tool is the most popular tool to deal with complex incomplete data. In mice package, there are several imputation techniques provided in mice() function such as predictive mean matching, random sample from observed data and linear regression to impute the missing data that appropriate for numerical data type. However, in this study, the results show there are slightly small difference in the precision when using these three imputation methods. Therefore, this paper proposed to apply feature selection using Boruta algorithm to select the most important features before building a missing data prediction model using mice package. The results show that applying the feature selection process before imputing the missing data increase precision in analysis prediction model regardless of the proportion of missing data. © 2023 Author(s).
publisher American Institute of Physics Inc.
issn 0094243X
language English
format Conference paper
accesstype
record_format scopus
collection Scopus
_version_ 1809677777965678592