Missing data imputation of MAGDAS-9’s ground electromagnetism with supervised machine learning and conventional statistical analysis models

Data imputation studies include reconstruction or estimation of imperfect data gaps caused by system sensing failure, and non-responsive data transmission remains an open issue. In space weather applications, imputation of ground electromagnetism is significant in capturing the complex interaction o...

Full description

Bibliographic Details
Published in:Alexandria Engineering Journal
Main Author: H. M.A.; K.A. N.D.; Md Tahir N.; Iffah Abd Latiff Z.; Huzaimy Jusoh M.; Akimasa Y.
Format: Article
Language:English
Published: Elsevier B.V. 2022
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85107441696&doi=10.1016%2fj.aej.2021.04.096&partnerID=40&md5=323a2d67c3a327e9b17b974bbce98404
id 2-s2.0-85107441696
spelling 2-s2.0-85107441696
H. M.A.; K.A. N.D.; Md Tahir N.; Iffah Abd Latiff Z.; Huzaimy Jusoh M.; Akimasa Y.
Missing data imputation of MAGDAS-9’s ground electromagnetism with supervised machine learning and conventional statistical analysis models
2022
Alexandria Engineering Journal
61
1
10.1016/j.aej.2021.04.096
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85107441696&doi=10.1016%2fj.aej.2021.04.096&partnerID=40&md5=323a2d67c3a327e9b17b974bbce98404
Data imputation studies include reconstruction or estimation of imperfect data gaps caused by system sensing failure, and non-responsive data transmission remains an open issue. In space weather applications, imputation of ground electromagnetism is significant in capturing the complex interaction of sun–earth prior to the subsequent analysis of the space weather effects. Key contributions to the demonstration of supervised machine learning (ML) imputation approach with artificial neural network, K-nearest neighbour, support vector regression (SVR), and General Regression Neural Network (GRNN) for MAGDAS-9 ground electromagnetism dataset have not yet been established. A total of 1,585,950 data points were analysed with supervised ML models which included performance benchmark with statistical analysis namely zero value substitution, listwise deletion, mean substitution, and hot deck imputation. To achieve low reconstruction errors, different imputation models with hyperparameter tuned settings are varied, and computational time execution has been shown to contribute to imputation performance. Performance metrics measured by mean square error (MSE), mean absolute error (MAE),mean absolute percentage error (MAPE), and execution time respectively demonstrate the capability of SVR to perfectly impute missing data for all ground electromagnetism components at an average of 0.314 MSE, 0.738 MAPE, closeness to 0.510 MAE and 0.91-second at various percentage level of data missingness. A comparison with traditional imputation shows that the supervised ML with SVR model has improved imputation performance by up to 80% of data gap. The outcome of the proposed imputation will benefit space weather applications for event characterisation, which will cover a large number of missing data in the MAGDAS-9 dataset. © 2021 THE AUTHORS
Elsevier B.V.
11100168
English
Article
All Open Access; Gold Open Access
author H. M.A.; K.A. N.D.; Md Tahir N.; Iffah Abd Latiff Z.; Huzaimy Jusoh M.; Akimasa Y.
spellingShingle H. M.A.; K.A. N.D.; Md Tahir N.; Iffah Abd Latiff Z.; Huzaimy Jusoh M.; Akimasa Y.
Missing data imputation of MAGDAS-9’s ground electromagnetism with supervised machine learning and conventional statistical analysis models
author_facet H. M.A.; K.A. N.D.; Md Tahir N.; Iffah Abd Latiff Z.; Huzaimy Jusoh M.; Akimasa Y.
author_sort H. M.A.; K.A. N.D.; Md Tahir N.; Iffah Abd Latiff Z.; Huzaimy Jusoh M.; Akimasa Y.
title Missing data imputation of MAGDAS-9’s ground electromagnetism with supervised machine learning and conventional statistical analysis models
title_short Missing data imputation of MAGDAS-9’s ground electromagnetism with supervised machine learning and conventional statistical analysis models
title_full Missing data imputation of MAGDAS-9’s ground electromagnetism with supervised machine learning and conventional statistical analysis models
title_fullStr Missing data imputation of MAGDAS-9’s ground electromagnetism with supervised machine learning and conventional statistical analysis models
title_full_unstemmed Missing data imputation of MAGDAS-9’s ground electromagnetism with supervised machine learning and conventional statistical analysis models
title_sort Missing data imputation of MAGDAS-9’s ground electromagnetism with supervised machine learning and conventional statistical analysis models
publishDate 2022
container_title Alexandria Engineering Journal
container_volume 61
container_issue 1
doi_str_mv 10.1016/j.aej.2021.04.096
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85107441696&doi=10.1016%2fj.aej.2021.04.096&partnerID=40&md5=323a2d67c3a327e9b17b974bbce98404
description Data imputation studies include reconstruction or estimation of imperfect data gaps caused by system sensing failure, and non-responsive data transmission remains an open issue. In space weather applications, imputation of ground electromagnetism is significant in capturing the complex interaction of sun–earth prior to the subsequent analysis of the space weather effects. Key contributions to the demonstration of supervised machine learning (ML) imputation approach with artificial neural network, K-nearest neighbour, support vector regression (SVR), and General Regression Neural Network (GRNN) for MAGDAS-9 ground electromagnetism dataset have not yet been established. A total of 1,585,950 data points were analysed with supervised ML models which included performance benchmark with statistical analysis namely zero value substitution, listwise deletion, mean substitution, and hot deck imputation. To achieve low reconstruction errors, different imputation models with hyperparameter tuned settings are varied, and computational time execution has been shown to contribute to imputation performance. Performance metrics measured by mean square error (MSE), mean absolute error (MAE),mean absolute percentage error (MAPE), and execution time respectively demonstrate the capability of SVR to perfectly impute missing data for all ground electromagnetism components at an average of 0.314 MSE, 0.738 MAPE, closeness to 0.510 MAE and 0.91-second at various percentage level of data missingness. A comparison with traditional imputation shows that the supervised ML with SVR model has improved imputation performance by up to 80% of data gap. The outcome of the proposed imputation will benefit space weather applications for event characterisation, which will cover a large number of missing data in the MAGDAS-9 dataset. © 2021 THE AUTHORS
publisher Elsevier B.V.
issn 11100168
language English
format Article
accesstype All Open Access; Gold Open Access
record_format scopus
collection Scopus
_version_ 1809677892878073856