Imputation Analysis for Time Series Air Quality (PM10) Data Set: A Comparison of Several Methods

Good quality data is important to guarantee for the best quality results of research analysis. However, the quality of the data often being impacted by the existence of missing values that bring bad implication on the accuracy of analysis and subsequently lead to biased results. In air quality data...

Full description

Bibliographic Details
Published in:Journal of Physics: Conference Series
Main Author: Shaadan N.; Rahim N.A.M.
Format: Conference paper
Language:English
Published: Institute of Physics Publishing 2019
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85076091384&doi=10.1088%2f1742-6596%2f1366%2f1%2f012107&partnerID=40&md5=4e71203f3ced4d9c6ae703236a27c4d8
id 2-s2.0-85076091384
spelling 2-s2.0-85076091384
Shaadan N.; Rahim N.A.M.
Imputation Analysis for Time Series Air Quality (PM10) Data Set: A Comparison of Several Methods
2019
Journal of Physics: Conference Series
1366
1
10.1088/1742-6596/1366/1/012107
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85076091384&doi=10.1088%2f1742-6596%2f1366%2f1%2f012107&partnerID=40&md5=4e71203f3ced4d9c6ae703236a27c4d8
Good quality data is important to guarantee for the best quality results of research analysis. However, the quality of the data often being impacted by the existence of missing values that bring bad implication on the accuracy of analysis and subsequently lead to biased results. In air quality data set, missing values problem often caused by various reasons, for example machine malfunction and errors, computer system crashes, human error and insufficient sampling used. In the case for time series modelling, complete series of data is very important to enable for the model construction. This paper aims to highlight a systematic statistical procedure and analysis on how to investigate the performance of several missing values imputation methods to solve for the problem of missing value existence when data are time series. The knowledge could help researchers to implement a comprehensive procedure in deciding a type of imputation method that suits with their data. A case study was conducted using real data set from Shah Alam air quality monitoring station. The results have shown that the missing data at the monitoring station is completely at random (MCAR). Among six imputation methods compared and based on the performance of indicators such as RMSE, MAE, AI and R2 it is shown that imputation using Kalman Filter using ARIMA model is the best appropriate method for the data set. © Published under licence by IOP Publishing Ltd.
Institute of Physics Publishing
17426588
English
Conference paper
All Open Access; Gold Open Access
author Shaadan N.; Rahim N.A.M.
spellingShingle Shaadan N.; Rahim N.A.M.
Imputation Analysis for Time Series Air Quality (PM10) Data Set: A Comparison of Several Methods
author_facet Shaadan N.; Rahim N.A.M.
author_sort Shaadan N.; Rahim N.A.M.
title Imputation Analysis for Time Series Air Quality (PM10) Data Set: A Comparison of Several Methods
title_short Imputation Analysis for Time Series Air Quality (PM10) Data Set: A Comparison of Several Methods
title_full Imputation Analysis for Time Series Air Quality (PM10) Data Set: A Comparison of Several Methods
title_fullStr Imputation Analysis for Time Series Air Quality (PM10) Data Set: A Comparison of Several Methods
title_full_unstemmed Imputation Analysis for Time Series Air Quality (PM10) Data Set: A Comparison of Several Methods
title_sort Imputation Analysis for Time Series Air Quality (PM10) Data Set: A Comparison of Several Methods
publishDate 2019
container_title Journal of Physics: Conference Series
container_volume 1366
container_issue 1
doi_str_mv 10.1088/1742-6596/1366/1/012107
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85076091384&doi=10.1088%2f1742-6596%2f1366%2f1%2f012107&partnerID=40&md5=4e71203f3ced4d9c6ae703236a27c4d8
description Good quality data is important to guarantee for the best quality results of research analysis. However, the quality of the data often being impacted by the existence of missing values that bring bad implication on the accuracy of analysis and subsequently lead to biased results. In air quality data set, missing values problem often caused by various reasons, for example machine malfunction and errors, computer system crashes, human error and insufficient sampling used. In the case for time series modelling, complete series of data is very important to enable for the model construction. This paper aims to highlight a systematic statistical procedure and analysis on how to investigate the performance of several missing values imputation methods to solve for the problem of missing value existence when data are time series. The knowledge could help researchers to implement a comprehensive procedure in deciding a type of imputation method that suits with their data. A case study was conducted using real data set from Shah Alam air quality monitoring station. The results have shown that the missing data at the monitoring station is completely at random (MCAR). Among six imputation methods compared and based on the performance of indicators such as RMSE, MAE, AI and R2 it is shown that imputation using Kalman Filter using ARIMA model is the best appropriate method for the data set. © Published under licence by IOP Publishing Ltd.
publisher Institute of Physics Publishing
issn 17426588
language English
format Conference paper
accesstype All Open Access; Gold Open Access
record_format scopus
collection Scopus
_version_ 1809677901459619840