Identifying Missing Data Mechanisms Among Incomplete Air Pollution Datasets in Malaysia

In several fields, including environmental research, missing data are a pervasive issue. It causes serious problems that may lead to significant obstacles when interpreting the findings. Missing data in ecological research are usually due to mechanical malfunction, regular maintenance, and human mis...

Full description

Bibliographic Details
Published in:Advances in Science, Technology and Innovation
Main Author: Libasin Z.; Ul-Saufie A.Z.; Ahmat H.; Shaziayani W.N.; Al-Jumeily D.
Format: Conference paper
Language:English
Published: Springer Nature 2024
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85199328091&doi=10.1007%2f978-3-031-43922-3_18&partnerID=40&md5=071f88a5a0eaf7dd92f42a142ebb6eab
id 2-s2.0-85199328091
spelling 2-s2.0-85199328091
Libasin Z.; Ul-Saufie A.Z.; Ahmat H.; Shaziayani W.N.; Al-Jumeily D.
Identifying Missing Data Mechanisms Among Incomplete Air Pollution Datasets in Malaysia
2024
Advances in Science, Technology and Innovation


10.1007/978-3-031-43922-3_18
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85199328091&doi=10.1007%2f978-3-031-43922-3_18&partnerID=40&md5=071f88a5a0eaf7dd92f42a142ebb6eab
In several fields, including environmental research, missing data are a pervasive issue. It causes serious problems that may lead to significant obstacles when interpreting the findings. Missing data in ecological research are usually due to mechanical malfunction, regular maintenance, and human mistakes. The key to selecting correct imputation techniques is by understanding which group of missing data mechanism observed. Missing data analysis methods are developed only for specific missing data mechanisms. Thus, any imputation techniques may yield bias results when they are not applied accordingly. In air quality data, the missing data mechanism is generally random, wherein the missing values are associated with MAR or MCAR. Therefore, this study aims to identify which group of missing data mechanism belongs to incomplete air pollution data sets in Malaysia. It utilised 15 years (2002-2016) of monitoring records on PM10, SO2, CO, O3, and NO2 of the Alor Setar station in the urban area category. The percentage of missing values for each variable was identified individually. The pattern of missingness was analysed using an independent t-test and logistic regression. A significant p-value shows evidence against the null hypothesis. It showed that the missing air pollution data were MAR or MNAR. For that reason, a logistic regression analysis was performed, and the result was significant. Thus, the missing data mechanism in Malaysia for air pollution data was MAR. It is essential to determine the correct missing group so that any imputation methods applied to the incomplete dataset will not produce bias results. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
Springer Nature
25228714
English
Conference paper

author Libasin Z.; Ul-Saufie A.Z.; Ahmat H.; Shaziayani W.N.; Al-Jumeily D.
spellingShingle Libasin Z.; Ul-Saufie A.Z.; Ahmat H.; Shaziayani W.N.; Al-Jumeily D.
Identifying Missing Data Mechanisms Among Incomplete Air Pollution Datasets in Malaysia
author_facet Libasin Z.; Ul-Saufie A.Z.; Ahmat H.; Shaziayani W.N.; Al-Jumeily D.
author_sort Libasin Z.; Ul-Saufie A.Z.; Ahmat H.; Shaziayani W.N.; Al-Jumeily D.
title Identifying Missing Data Mechanisms Among Incomplete Air Pollution Datasets in Malaysia
title_short Identifying Missing Data Mechanisms Among Incomplete Air Pollution Datasets in Malaysia
title_full Identifying Missing Data Mechanisms Among Incomplete Air Pollution Datasets in Malaysia
title_fullStr Identifying Missing Data Mechanisms Among Incomplete Air Pollution Datasets in Malaysia
title_full_unstemmed Identifying Missing Data Mechanisms Among Incomplete Air Pollution Datasets in Malaysia
title_sort Identifying Missing Data Mechanisms Among Incomplete Air Pollution Datasets in Malaysia
publishDate 2024
container_title Advances in Science, Technology and Innovation
container_volume
container_issue
doi_str_mv 10.1007/978-3-031-43922-3_18
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85199328091&doi=10.1007%2f978-3-031-43922-3_18&partnerID=40&md5=071f88a5a0eaf7dd92f42a142ebb6eab
description In several fields, including environmental research, missing data are a pervasive issue. It causes serious problems that may lead to significant obstacles when interpreting the findings. Missing data in ecological research are usually due to mechanical malfunction, regular maintenance, and human mistakes. The key to selecting correct imputation techniques is by understanding which group of missing data mechanism observed. Missing data analysis methods are developed only for specific missing data mechanisms. Thus, any imputation techniques may yield bias results when they are not applied accordingly. In air quality data, the missing data mechanism is generally random, wherein the missing values are associated with MAR or MCAR. Therefore, this study aims to identify which group of missing data mechanism belongs to incomplete air pollution data sets in Malaysia. It utilised 15 years (2002-2016) of monitoring records on PM10, SO2, CO, O3, and NO2 of the Alor Setar station in the urban area category. The percentage of missing values for each variable was identified individually. The pattern of missingness was analysed using an independent t-test and logistic regression. A significant p-value shows evidence against the null hypothesis. It showed that the missing air pollution data were MAR or MNAR. For that reason, a logistic regression analysis was performed, and the result was significant. Thus, the missing data mechanism in Malaysia for air pollution data was MAR. It is essential to determine the correct missing group so that any imputation methods applied to the incomplete dataset will not produce bias results. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
publisher Springer Nature
issn 25228714
language English
format Conference paper
accesstype
record_format scopus
collection Scopus
_version_ 1809678153098985472