Preliminary study on multiple imputation for nonresponse in survey data with feature selection

Missing data prediction is ubiquitous in survey research. Multiple imputation is a common approach for handling missing observation on survey data. Feature selection is a technique to find the best features from a dataset before building a predictive model for missing data. The mice package in R pro...

Full description

Bibliographic Details
Published in:AIP Conference Proceedings
Main Author: Jasin A.M.; Asmat A.
Format: Conference paper
Language:English
Published: American Institute of Physics Inc. 2023
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85166536489&doi=10.1063%2f5.0129514&partnerID=40&md5=1507078e030d5a2a896354e9e8f507f5
Description
Summary:Missing data prediction is ubiquitous in survey research. Multiple imputation is a common approach for handling missing observation on survey data. Feature selection is a technique to find the best features from a dataset before building a predictive model for missing data. The mice package in R programming tool is the most popular tool to deal with complex incomplete data. In mice package, there are several imputation techniques provided in mice() function such as predictive mean matching, random sample from observed data and linear regression to impute the missing data that appropriate for numerical data type. However, in this study, the results show there are slightly small difference in the precision when using these three imputation methods. Therefore, this paper proposed to apply feature selection using Boruta algorithm to select the most important features before building a missing data prediction model using mice package. The results show that applying the feature selection process before imputing the missing data increase precision in analysis prediction model regardless of the proportion of missing data. © 2023 Author(s).
ISSN:0094243X
DOI:10.1063/5.0129514