Summary: | Missing data prediction is ubiquitous in survey research. Multiple imputation is a common approach for handling missing observation on survey data. Feature selection is a technique to find the best features from a dataset before building a predictive model for missing data. The mice package in R programming tool is the most popular tool to deal with complex incomplete data. In mice package, there are several imputation techniques provided in mice() function such as predictive mean matching, random sample from observed data and linear regression to impute the missing data that appropriate for numerical data type. However, in this study, the results show there are slightly small difference in the precision when using these three imputation methods. Therefore, this paper proposed to apply feature selection using Boruta algorithm to select the most important features before building a missing data prediction model using mice package. The results show that applying the feature selection process before imputing the missing data increase precision in analysis prediction model regardless of the proportion of missing data. © 2023 Author(s).
|