Summary: | In several fields, including environmental research, missing data are a pervasive issue. It causes serious problems that may lead to significant obstacles when interpreting the findings. Missing data in ecological research are usually due to mechanical malfunction, regular maintenance, and human mistakes. The key to selecting correct imputation techniques is by understanding which group of missing data mechanism observed. Missing data analysis methods are developed only for specific missing data mechanisms. Thus, any imputation techniques may yield bias results when they are not applied accordingly. In air quality data, the missing data mechanism is generally random, wherein the missing values are associated with MAR or MCAR. Therefore, this study aims to identify which group of missing data mechanism belongs to incomplete air pollution data sets in Malaysia. It utilised 15 years (2002-2016) of monitoring records on PM10, SO2, CO, O3, and NO2 of the Alor Setar station in the urban area category. The percentage of missing values for each variable was identified individually. The pattern of missingness was analysed using an independent t-test and logistic regression. A significant p-value shows evidence against the null hypothesis. It showed that the missing air pollution data were MAR or MNAR. For that reason, a logistic regression analysis was performed, and the result was significant. Thus, the missing data mechanism in Malaysia for air pollution data was MAR. It is essential to determine the correct missing group so that any imputation methods applied to the incomplete dataset will not produce bias results. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
|