Prediction of missing data in rainfall dataset by using simple statistical method

Almost all of the data obtained from hydrological station contains missing data. Usually, this problem occurs due to equipment failures, maintenance work and human error. Incomplete dataset will reduce the ability of a statistical analysis and can cause a bias estimation due to systematic difference...

Full description

Bibliographic Details
Published in:IOP Conference Series: Earth and Environmental Science
Main Author: Mohd Jafri I.A.; Noor N.M.; Ul-Saufie A.Z.; Suwardi A.
Format: Conference paper
Language:English
Published: IOP Publishing Ltd 2020
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85100054980&doi=10.1088%2f1755-1315%2f616%2f1%2f012005&partnerID=40&md5=d52e0b75ad445edf4ed271b62efec012
id 2-s2.0-85100054980
spelling 2-s2.0-85100054980
Mohd Jafri I.A.; Noor N.M.; Ul-Saufie A.Z.; Suwardi A.
Prediction of missing data in rainfall dataset by using simple statistical method
2020
IOP Conference Series: Earth and Environmental Science
616
1
10.1088/1755-1315/616/1/012005
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85100054980&doi=10.1088%2f1755-1315%2f616%2f1%2f012005&partnerID=40&md5=d52e0b75ad445edf4ed271b62efec012
Almost all of the data obtained from hydrological station contains missing data. Usually, this problem occurs due to equipment failures, maintenance work and human error. Incomplete dataset will reduce the ability of a statistical analysis and can cause a bias estimation due to systematic differences between observed and unobserved data. In this study, four simple statistical method such as Series Mean, Average Mean Top Bottom, Linear Interpolation and Nearest Neighbour were applied to predict the missing values in a rainfall dataset. An annual daily data for rainfall from nine selected monitoring station (from 2009 until 2018) were described using descriptive statistic. Then, the dataset were randomly simulated into 4 percentages of missing (5%, 10%, 15% and 20%) by using statistical package for social sciences software. The performance of this imputation methods were evaluated by using four performance indicators namely Mean Absolute Error, Root Mean Squared Error, Prediction Accuracy, and Index of Agreement. Overall, Linear Interpolation method was selected as the best imputation method to predict the missing data in the rainfall dataset. © 2020 Institute of Physics Publishing. All rights reserved.
IOP Publishing Ltd
17551307
English
Conference paper
All Open Access; Gold Open Access
author Mohd Jafri I.A.; Noor N.M.; Ul-Saufie A.Z.; Suwardi A.
spellingShingle Mohd Jafri I.A.; Noor N.M.; Ul-Saufie A.Z.; Suwardi A.
Prediction of missing data in rainfall dataset by using simple statistical method
author_facet Mohd Jafri I.A.; Noor N.M.; Ul-Saufie A.Z.; Suwardi A.
author_sort Mohd Jafri I.A.; Noor N.M.; Ul-Saufie A.Z.; Suwardi A.
title Prediction of missing data in rainfall dataset by using simple statistical method
title_short Prediction of missing data in rainfall dataset by using simple statistical method
title_full Prediction of missing data in rainfall dataset by using simple statistical method
title_fullStr Prediction of missing data in rainfall dataset by using simple statistical method
title_full_unstemmed Prediction of missing data in rainfall dataset by using simple statistical method
title_sort Prediction of missing data in rainfall dataset by using simple statistical method
publishDate 2020
container_title IOP Conference Series: Earth and Environmental Science
container_volume 616
container_issue 1
doi_str_mv 10.1088/1755-1315/616/1/012005
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85100054980&doi=10.1088%2f1755-1315%2f616%2f1%2f012005&partnerID=40&md5=d52e0b75ad445edf4ed271b62efec012
description Almost all of the data obtained from hydrological station contains missing data. Usually, this problem occurs due to equipment failures, maintenance work and human error. Incomplete dataset will reduce the ability of a statistical analysis and can cause a bias estimation due to systematic differences between observed and unobserved data. In this study, four simple statistical method such as Series Mean, Average Mean Top Bottom, Linear Interpolation and Nearest Neighbour were applied to predict the missing values in a rainfall dataset. An annual daily data for rainfall from nine selected monitoring station (from 2009 until 2018) were described using descriptive statistic. Then, the dataset were randomly simulated into 4 percentages of missing (5%, 10%, 15% and 20%) by using statistical package for social sciences software. The performance of this imputation methods were evaluated by using four performance indicators namely Mean Absolute Error, Root Mean Squared Error, Prediction Accuracy, and Index of Agreement. Overall, Linear Interpolation method was selected as the best imputation method to predict the missing data in the rainfall dataset. © 2020 Institute of Physics Publishing. All rights reserved.
publisher IOP Publishing Ltd
issn 17551307
language English
format Conference paper
accesstype All Open Access; Gold Open Access
record_format scopus
collection Scopus
_version_ 1809677895048626176