Application of functional data analysis for the treatment of missing air quality data
In most research including environmental research, missing recorded data often exists and has become a common problem for data quality. In this study, several imputation methods that have been designed based on the techniques for functional data analysis are introduced and the capability of the meth...
Published in: | Sains Malaysiana |
---|---|
Main Author: | |
Format: | Article |
Language: | English |
Published: |
Penerbit Universiti Kebangsaan Malaysia
2015
|
Online Access: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-84952043609&doi=10.17576%2fjsm-2015-4410-19&partnerID=40&md5=5bab3eb018e5f5bfeaca2c37b86eb086 |
id |
2-s2.0-84952043609 |
---|---|
spelling |
2-s2.0-84952043609 Shaadan N.; Deni S.M.; Jemain A.A. Application of functional data analysis for the treatment of missing air quality data 2015 Sains Malaysiana 44 10 10.17576/jsm-2015-4410-19 https://www.scopus.com/inward/record.uri?eid=2-s2.0-84952043609&doi=10.17576%2fjsm-2015-4410-19&partnerID=40&md5=5bab3eb018e5f5bfeaca2c37b86eb086 In most research including environmental research, missing recorded data often exists and has become a common problem for data quality. In this study, several imputation methods that have been designed based on the techniques for functional data analysis are introduced and the capability of the methods for estimating missing values is investigated. Single imputation methods and iterative imputation methods are conducted by means of curve estimation using regression and roughness penalty smoothing approaches. The performance of the methods is compared using a reference data set, the real PM10 data from an air quality monitoring station namely the Petaling Jaya station located at the western part of Peninsular Malaysia. A hundred of the missing data sets that have been generated from a reference data set with six different patterns of missing values are used to investigate the performance of the considered methods. The patterns are simulated according to three percentages (5, 10 and 15) of missing values with respect to two different sizes (3 and 7) of maximum gap lengths (consecutive missing points). By means of the mean absolute error, the index of agreement and the coefficient of determination as the performance indicators, the results have showed that the iterative imputation method using the roughness penalty approach is more flexible and superior to other methods. Penerbit Universiti Kebangsaan Malaysia 1266039 English Article All Open Access; Gold Open Access |
author |
Shaadan N.; Deni S.M.; Jemain A.A. |
spellingShingle |
Shaadan N.; Deni S.M.; Jemain A.A. Application of functional data analysis for the treatment of missing air quality data |
author_facet |
Shaadan N.; Deni S.M.; Jemain A.A. |
author_sort |
Shaadan N.; Deni S.M.; Jemain A.A. |
title |
Application of functional data analysis for the treatment of missing air quality data |
title_short |
Application of functional data analysis for the treatment of missing air quality data |
title_full |
Application of functional data analysis for the treatment of missing air quality data |
title_fullStr |
Application of functional data analysis for the treatment of missing air quality data |
title_full_unstemmed |
Application of functional data analysis for the treatment of missing air quality data |
title_sort |
Application of functional data analysis for the treatment of missing air quality data |
publishDate |
2015 |
container_title |
Sains Malaysiana |
container_volume |
44 |
container_issue |
10 |
doi_str_mv |
10.17576/jsm-2015-4410-19 |
url |
https://www.scopus.com/inward/record.uri?eid=2-s2.0-84952043609&doi=10.17576%2fjsm-2015-4410-19&partnerID=40&md5=5bab3eb018e5f5bfeaca2c37b86eb086 |
description |
In most research including environmental research, missing recorded data often exists and has become a common problem for data quality. In this study, several imputation methods that have been designed based on the techniques for functional data analysis are introduced and the capability of the methods for estimating missing values is investigated. Single imputation methods and iterative imputation methods are conducted by means of curve estimation using regression and roughness penalty smoothing approaches. The performance of the methods is compared using a reference data set, the real PM10 data from an air quality monitoring station namely the Petaling Jaya station located at the western part of Peninsular Malaysia. A hundred of the missing data sets that have been generated from a reference data set with six different patterns of missing values are used to investigate the performance of the considered methods. The patterns are simulated according to three percentages (5, 10 and 15) of missing values with respect to two different sizes (3 and 7) of maximum gap lengths (consecutive missing points). By means of the mean absolute error, the index of agreement and the coefficient of determination as the performance indicators, the results have showed that the iterative imputation method using the roughness penalty approach is more flexible and superior to other methods. |
publisher |
Penerbit Universiti Kebangsaan Malaysia |
issn |
1266039 |
language |
English |
format |
Article |
accesstype |
All Open Access; Gold Open Access |
record_format |
scopus |
collection |
Scopus |
_version_ |
1809677911001661440 |