Identifying Missing Data Mechanisms Among Incomplete Air Pollution Datasets in Malaysia

In several fields, including environmental research, missing data are a pervasive issue. It causes serious problems that may lead to significant obstacles when interpreting the findings. Missing data in ecological research are usually due to mechanical malfunction, regular maintenance, and human mis...

Full description

Bibliographic Details
Published in:	Advances in Science, Technology and Innovation
Main Author:	Libasin Z.; Ul-Saufie A.Z.; Ahmat H.; Shaziayani W.N.; Al-Jumeily D.
Format:	Conference paper
Language:	English
Published:	Springer Nature 2024
Online Access:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85199328091&doi=10.1007%2f978-3-031-43922-3_18&partnerID=40&md5=071f88a5a0eaf7dd92f42a142ebb6eab

Description
Summary:	In several fields, including environmental research, missing data are a pervasive issue. It causes serious problems that may lead to significant obstacles when interpreting the findings. Missing data in ecological research are usually due to mechanical malfunction, regular maintenance, and human mistakes. The key to selecting correct imputation techniques is by understanding which group of missing data mechanism observed. Missing data analysis methods are developed only for specific missing data mechanisms. Thus, any imputation techniques may yield bias results when they are not applied accordingly. In air quality data, the missing data mechanism is generally random, wherein the missing values are associated with MAR or MCAR. Therefore, this study aims to identify which group of missing data mechanism belongs to incomplete air pollution data sets in Malaysia. It utilised 15 years (2002-2016) of monitoring records on PM10, SO2, CO, O3, and NO2 of the Alor Setar station in the urban area category. The percentage of missing values for each variable was identified individually. The pattern of missingness was analysed using an independent t-test and logistic regression. A significant p-value shows evidence against the null hypothesis. It showed that the missing air pollution data were MAR or MNAR. For that reason, a logistic regression analysis was performed, and the result was significant. Thus, the missing data mechanism in Malaysia for air pollution data was MAR. It is essential to determine the correct missing group so that any imputation methods applied to the incomplete dataset will not produce bias results. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
ISSN:	25228714
DOI:	10.1007/978-3-031-43922-3_18

Identifying Missing Data Mechanisms Among Incomplete Air Pollution Datasets in Malaysia

Similar Items