Preliminary study on multiple imputation for nonresponse in survey data with feature selection
Missing data prediction is ubiquitous in survey research. Multiple imputation is a common approach for handling missing observation on survey data. Feature selection is a technique to find the best features from a dataset before building a predictive model for missing data. The mice package in R pro...
Published in: | AIP Conference Proceedings |
---|---|
Main Author: | |
Format: | Conference paper |
Language: | English |
Published: |
American Institute of Physics Inc.
2023
|
Online Access: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-85166536489&doi=10.1063%2f5.0129514&partnerID=40&md5=1507078e030d5a2a896354e9e8f507f5 |
id |
2-s2.0-85166536489 |
---|---|
spelling |
2-s2.0-85166536489 Jasin A.M.; Asmat A. Preliminary study on multiple imputation for nonresponse in survey data with feature selection 2023 AIP Conference Proceedings 2608 10.1063/5.0129514 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85166536489&doi=10.1063%2f5.0129514&partnerID=40&md5=1507078e030d5a2a896354e9e8f507f5 Missing data prediction is ubiquitous in survey research. Multiple imputation is a common approach for handling missing observation on survey data. Feature selection is a technique to find the best features from a dataset before building a predictive model for missing data. The mice package in R programming tool is the most popular tool to deal with complex incomplete data. In mice package, there are several imputation techniques provided in mice() function such as predictive mean matching, random sample from observed data and linear regression to impute the missing data that appropriate for numerical data type. However, in this study, the results show there are slightly small difference in the precision when using these three imputation methods. Therefore, this paper proposed to apply feature selection using Boruta algorithm to select the most important features before building a missing data prediction model using mice package. The results show that applying the feature selection process before imputing the missing data increase precision in analysis prediction model regardless of the proportion of missing data. © 2023 Author(s). American Institute of Physics Inc. 0094243X English Conference paper |
author |
Jasin A.M.; Asmat A. |
spellingShingle |
Jasin A.M.; Asmat A. Preliminary study on multiple imputation for nonresponse in survey data with feature selection |
author_facet |
Jasin A.M.; Asmat A. |
author_sort |
Jasin A.M.; Asmat A. |
title |
Preliminary study on multiple imputation for nonresponse in survey data with feature selection |
title_short |
Preliminary study on multiple imputation for nonresponse in survey data with feature selection |
title_full |
Preliminary study on multiple imputation for nonresponse in survey data with feature selection |
title_fullStr |
Preliminary study on multiple imputation for nonresponse in survey data with feature selection |
title_full_unstemmed |
Preliminary study on multiple imputation for nonresponse in survey data with feature selection |
title_sort |
Preliminary study on multiple imputation for nonresponse in survey data with feature selection |
publishDate |
2023 |
container_title |
AIP Conference Proceedings |
container_volume |
2608 |
container_issue |
|
doi_str_mv |
10.1063/5.0129514 |
url |
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85166536489&doi=10.1063%2f5.0129514&partnerID=40&md5=1507078e030d5a2a896354e9e8f507f5 |
description |
Missing data prediction is ubiquitous in survey research. Multiple imputation is a common approach for handling missing observation on survey data. Feature selection is a technique to find the best features from a dataset before building a predictive model for missing data. The mice package in R programming tool is the most popular tool to deal with complex incomplete data. In mice package, there are several imputation techniques provided in mice() function such as predictive mean matching, random sample from observed data and linear regression to impute the missing data that appropriate for numerical data type. However, in this study, the results show there are slightly small difference in the precision when using these three imputation methods. Therefore, this paper proposed to apply feature selection using Boruta algorithm to select the most important features before building a missing data prediction model using mice package. The results show that applying the feature selection process before imputing the missing data increase precision in analysis prediction model regardless of the proportion of missing data. © 2023 Author(s). |
publisher |
American Institute of Physics Inc. |
issn |
0094243X |
language |
English |
format |
Conference paper |
accesstype |
|
record_format |
scopus |
collection |
Scopus |
_version_ |
1809677777965678592 |