Missing data exploration in air quality data set using r-package data visualisation tools

Missing values often occur in many data sets of various research areas. This has been recognized as data quality problem because missing values could affect the performance of analysis results. To overcome the problem, the incomplete data set needs to be treated using imputation method. Thus, explor...

Full description

Bibliographic Details
Published in:Bulletin of Electrical Engineering and Informatics
Main Author: Ghazali S.M.; Shaadan N.; Idrus Z.
Format: Article
Language:English
Published: Institute of Advanced Engineering and Science 2020
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85083047189&doi=10.11591%2feei.v9i2.2088&partnerID=40&md5=909ffdb37449535c3d93fd96c88de6ec
id 2-s2.0-85083047189
spelling 2-s2.0-85083047189
Ghazali S.M.; Shaadan N.; Idrus Z.
Missing data exploration in air quality data set using r-package data visualisation tools
2020
Bulletin of Electrical Engineering and Informatics
9
2
10.11591/eei.v9i2.2088
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85083047189&doi=10.11591%2feei.v9i2.2088&partnerID=40&md5=909ffdb37449535c3d93fd96c88de6ec
Missing values often occur in many data sets of various research areas. This has been recognized as data quality problem because missing values could affect the performance of analysis results. To overcome the problem, the incomplete data set needs to be treated using imputation method. Thus, exploring missing values pattern must be conducted beforehand to determine a suitable method. This paper discusses on the application of data visualisation as a smart technique for missing data exploration aiming to increase understanding on missing data behaviour which include missing data mechanism (MCAR, MAR and MNAR), distribution pattern of missingness in terms of percentage as well as the gap size. This paper presents the application of several data visualisation tools from five R-packages such as visdat, VIM, ggplot2, Amelia and UpSetR for data missingness exploration. For an illustration, based on an air quality data set in Malaysia, several graphics were produced to illustrate the contribution of the visualisation tools in providing insight on the pattern of missingness. Based on the results, it is shown that missing values in air quality data set of the chosen sites in Malaysia behave as missing at random (MAR) with small percentage and long gap sizes of missingness. © 2020, Institute of Advanced Engineering and Science. All rights reserved.
Institute of Advanced Engineering and Science
20893191
English
Article
All Open Access; Gold Open Access; Green Open Access
author Ghazali S.M.; Shaadan N.; Idrus Z.
spellingShingle Ghazali S.M.; Shaadan N.; Idrus Z.
Missing data exploration in air quality data set using r-package data visualisation tools
author_facet Ghazali S.M.; Shaadan N.; Idrus Z.
author_sort Ghazali S.M.; Shaadan N.; Idrus Z.
title Missing data exploration in air quality data set using r-package data visualisation tools
title_short Missing data exploration in air quality data set using r-package data visualisation tools
title_full Missing data exploration in air quality data set using r-package data visualisation tools
title_fullStr Missing data exploration in air quality data set using r-package data visualisation tools
title_full_unstemmed Missing data exploration in air quality data set using r-package data visualisation tools
title_sort Missing data exploration in air quality data set using r-package data visualisation tools
publishDate 2020
container_title Bulletin of Electrical Engineering and Informatics
container_volume 9
container_issue 2
doi_str_mv 10.11591/eei.v9i2.2088
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85083047189&doi=10.11591%2feei.v9i2.2088&partnerID=40&md5=909ffdb37449535c3d93fd96c88de6ec
description Missing values often occur in many data sets of various research areas. This has been recognized as data quality problem because missing values could affect the performance of analysis results. To overcome the problem, the incomplete data set needs to be treated using imputation method. Thus, exploring missing values pattern must be conducted beforehand to determine a suitable method. This paper discusses on the application of data visualisation as a smart technique for missing data exploration aiming to increase understanding on missing data behaviour which include missing data mechanism (MCAR, MAR and MNAR), distribution pattern of missingness in terms of percentage as well as the gap size. This paper presents the application of several data visualisation tools from five R-packages such as visdat, VIM, ggplot2, Amelia and UpSetR for data missingness exploration. For an illustration, based on an air quality data set in Malaysia, several graphics were produced to illustrate the contribution of the visualisation tools in providing insight on the pattern of missingness. Based on the results, it is shown that missing values in air quality data set of the chosen sites in Malaysia behave as missing at random (MAR) with small percentage and long gap sizes of missingness. © 2020, Institute of Advanced Engineering and Science. All rights reserved.
publisher Institute of Advanced Engineering and Science
issn 20893191
language English
format Article
accesstype All Open Access; Gold Open Access; Green Open Access
record_format scopus
collection Scopus
_version_ 1820775465241018368