Comparison of single and mice imputation methods for missing values: A simulation study

High quality data is essential in every field of research for valid research findings. The presence of missing data in a dataset is common and occurs for a variety of reasons such as incomplete responses, equipment malfunction and data entry error. Single and multiple data imputation methods have be...

Full description

Bibliographic Details
Published in:Pertanika Journal of Science and Technology
Main Author: Pauzi N.A.M.; Wah Y.B.; Deni S.M.; Rahim S.K.N.A.; Suhartono
Format: Article
Language:English
Published: Universiti Putra Malaysia Press 2021
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85106861952&doi=10.47836%2fpjst.29.2.15&partnerID=40&md5=33c00e57e50b9f865aebb2fc1868ed20
Description
Summary:High quality data is essential in every field of research for valid research findings. The presence of missing data in a dataset is common and occurs for a variety of reasons such as incomplete responses, equipment malfunction and data entry error. Single and multiple data imputation methods have been developed for data imputation of missing values. This study investigated the performance of single imputation using mean and multiple imputation method using Multivariate Imputation by Chained Equations (MICE) via a simulation study. The MCAR which means missing completely at random were generated randomly for ten levels of missing rates (proportion of missing data): 5% to 50% for different sample sizes. Mean Square Error (MSE) was used to evaluate the performance of the imputation methods. Data imputation method depends on data types. Mean imputation is commonly used to impute missing values for continuous variable while MICE method can handle both continuous and categorical variables. The simulation results indicate that group mean imputation (GMI) performed better compared to overall mean imputation (OMI) and MICE with lowest value of MSE for all sample sizes and missing rates. The MSE of OMI, GMI, and MICE increases when missing rate increases. The MICE method has the lowest performance (i.e. highest MSE) when percentage of missing rates is more than 15%. Overall, GMI is more superior compared to OMI and MICE for all missing rates and sample size for MCAR mechanism. An application to a real dataset confirmed the findings of the simulation results. The findings of this study can provide knowledge to researchers and practitioners on which imputation method is more suitable when the data involves missing data. © Universiti Putra Malaysia Press.
ISSN:01287680
DOI:10.47836/pjst.29.2.15