Machine learning algorithms for high-resolution prediction of spatiotemporal distribution of air pollution from meteorological and soil parameters

This study uses machine learning (ML) models for a high-resolution prediction (0.1°×0.1°) of air fine particular matter (PM2.5) concentration, the most harmful to human health, from meteorological and soil data. Iraq was considered the study area to implement the method. Different lags and the chang...

Full description

Bibliographic Details
Published in:Environment International
Main Author: Tao H.; Jawad A.H.; Shather A.H.; Al-Khafaji Z.; Rashid T.A.; Ali M.; Al-Ansari N.; Marhoon H.A.; Shahid S.; Yaseen Z.M.
Format: Article
Language:English
Published: Elsevier Ltd 2023
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85153537218&doi=10.1016%2fj.envint.2023.107931&partnerID=40&md5=91f7aa5b1ffbb13a2eae2389f70ff7ea
id 2-s2.0-85153537218
spelling 2-s2.0-85153537218
Tao H.; Jawad A.H.; Shather A.H.; Al-Khafaji Z.; Rashid T.A.; Ali M.; Al-Ansari N.; Marhoon H.A.; Shahid S.; Yaseen Z.M.
Machine learning algorithms for high-resolution prediction of spatiotemporal distribution of air pollution from meteorological and soil parameters
2023
Environment International
175

10.1016/j.envint.2023.107931
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85153537218&doi=10.1016%2fj.envint.2023.107931&partnerID=40&md5=91f7aa5b1ffbb13a2eae2389f70ff7ea
This study uses machine learning (ML) models for a high-resolution prediction (0.1°×0.1°) of air fine particular matter (PM2.5) concentration, the most harmful to human health, from meteorological and soil data. Iraq was considered the study area to implement the method. Different lags and the changing patterns of four European Reanalysis (ERA5) meteorological variables, rainfall, mean temperature, wind speed and relative humidity, and one soil parameter, the soil moisture, were used to select the suitable set of predictors using a non-greedy algorithm known as simulated annealing (SA). The selected predictors were used to simulate the temporal and spatial variability of air PM2.5 concentration over Iraq during the early summer (May-July), the most polluted months, using three advanced ML models, extremely randomized trees (ERT), stochastic gradient descent backpropagation (SGD-BP) and long short-term memory (LSTM) integrated with Bayesian optimizer. The spatial distribution of the annual average PM2.5 revealed the population of the whole of Iraq is exposed to a pollution level above the standard limit. The changes in temperature and soil moisture and the mean wind speed and humidity of the month before the early summer can predict the temporal and spatial variability of PM2.5 over Iraq during May-July. Results revealed the higher performance of LSTM with normalized root-mean-square error and Kling-Gupta efficiency of 13.4% and 0.89, compared to 16.02% and 0.81 for SDG-BP and 17.9% and 0.74 for ERT. The LSTM could also reconstruct the observed spatial distribution of PM2.5 with MapCurve and Cramer's V values of 0.95 and 0.91, compared to 0.9 and 0.86 for SGD-BP and 0.83 and 0.76 for ERT. The study provided a methodology for forecasting spatial variability of PM2.5 concentration at high resolution during the peak pollution months from freely available data, which can be replicated in other regions for generating high-resolution PM2.5 forecasting maps. © 2023 The Authors
Elsevier Ltd
1604120
English
Article
All Open Access; Gold Open Access
author Tao H.; Jawad A.H.; Shather A.H.; Al-Khafaji Z.; Rashid T.A.; Ali M.; Al-Ansari N.; Marhoon H.A.; Shahid S.; Yaseen Z.M.
spellingShingle Tao H.; Jawad A.H.; Shather A.H.; Al-Khafaji Z.; Rashid T.A.; Ali M.; Al-Ansari N.; Marhoon H.A.; Shahid S.; Yaseen Z.M.
Machine learning algorithms for high-resolution prediction of spatiotemporal distribution of air pollution from meteorological and soil parameters
author_facet Tao H.; Jawad A.H.; Shather A.H.; Al-Khafaji Z.; Rashid T.A.; Ali M.; Al-Ansari N.; Marhoon H.A.; Shahid S.; Yaseen Z.M.
author_sort Tao H.; Jawad A.H.; Shather A.H.; Al-Khafaji Z.; Rashid T.A.; Ali M.; Al-Ansari N.; Marhoon H.A.; Shahid S.; Yaseen Z.M.
title Machine learning algorithms for high-resolution prediction of spatiotemporal distribution of air pollution from meteorological and soil parameters
title_short Machine learning algorithms for high-resolution prediction of spatiotemporal distribution of air pollution from meteorological and soil parameters
title_full Machine learning algorithms for high-resolution prediction of spatiotemporal distribution of air pollution from meteorological and soil parameters
title_fullStr Machine learning algorithms for high-resolution prediction of spatiotemporal distribution of air pollution from meteorological and soil parameters
title_full_unstemmed Machine learning algorithms for high-resolution prediction of spatiotemporal distribution of air pollution from meteorological and soil parameters
title_sort Machine learning algorithms for high-resolution prediction of spatiotemporal distribution of air pollution from meteorological and soil parameters
publishDate 2023
container_title Environment International
container_volume 175
container_issue
doi_str_mv 10.1016/j.envint.2023.107931
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85153537218&doi=10.1016%2fj.envint.2023.107931&partnerID=40&md5=91f7aa5b1ffbb13a2eae2389f70ff7ea
description This study uses machine learning (ML) models for a high-resolution prediction (0.1°×0.1°) of air fine particular matter (PM2.5) concentration, the most harmful to human health, from meteorological and soil data. Iraq was considered the study area to implement the method. Different lags and the changing patterns of four European Reanalysis (ERA5) meteorological variables, rainfall, mean temperature, wind speed and relative humidity, and one soil parameter, the soil moisture, were used to select the suitable set of predictors using a non-greedy algorithm known as simulated annealing (SA). The selected predictors were used to simulate the temporal and spatial variability of air PM2.5 concentration over Iraq during the early summer (May-July), the most polluted months, using three advanced ML models, extremely randomized trees (ERT), stochastic gradient descent backpropagation (SGD-BP) and long short-term memory (LSTM) integrated with Bayesian optimizer. The spatial distribution of the annual average PM2.5 revealed the population of the whole of Iraq is exposed to a pollution level above the standard limit. The changes in temperature and soil moisture and the mean wind speed and humidity of the month before the early summer can predict the temporal and spatial variability of PM2.5 over Iraq during May-July. Results revealed the higher performance of LSTM with normalized root-mean-square error and Kling-Gupta efficiency of 13.4% and 0.89, compared to 16.02% and 0.81 for SDG-BP and 17.9% and 0.74 for ERT. The LSTM could also reconstruct the observed spatial distribution of PM2.5 with MapCurve and Cramer's V values of 0.95 and 0.91, compared to 0.9 and 0.86 for SGD-BP and 0.83 and 0.76 for ERT. The study provided a methodology for forecasting spatial variability of PM2.5 concentration at high resolution during the peak pollution months from freely available data, which can be replicated in other regions for generating high-resolution PM2.5 forecasting maps. © 2023 The Authors
publisher Elsevier Ltd
issn 1604120
language English
format Article
accesstype All Open Access; Gold Open Access
record_format scopus
collection Scopus
_version_ 1809678018207023104