Summary: | Air is the most crucial element for the survival of life on Earth. The air we breathe has a profound effect on our ecosystem biodiversity. Consequently, it is always prudent to monitor the air quality in our environment. There are few ways can be done in predicting the air pollution index (API) like data mining. Therefore, this study aimed to evaluate three types of support vector regression (linear, SVR, libSVR) in predicting the air pollutant concentration and identify the best model. This study also would like to calculate the API by using the proposed model. The secondary daily data is used in this study from year 2002 to 2020 from the Department of Environment (DoE) Malaysia which located at Petaling Jaya monitoring station. There are six major pollutants that have been focusing in this work like PM10, PM2.5, SO2, NO2, CO, and O3. The root means square error (RMSE), mean absolute error (MAE) and relative error (RE) were used to evaluate the performance of the regression models. Experimental results showed that the best model is linear SVR with average of RMSE = 5.548, MAE = 3.490, and RE = 27.98% because had the lowest total rank value of RMSE, MAE, and RE for five air pollutants (PM10, PM2.5, SO2, CO, O3) in this study. Unlikely for NO2, the best model is support vector regression (SVR) with RMSE = 0.007, MAE = 0.006, and RE = 20.75% in predicting the air pollutant concentration. This work also illustrates that combining data mining with air pollutants prediction is an efficient and convenient way to solve some related environment problems. The best model has the potential to be applied as an early warning system to inform local authorities about the air quality and can reliably predict the daily air pollution events over three consecutive days. Besides, good air quality plays a significant role in supporting biodiversity and maintaning healthy ecosystems. © 2023 Universitatea "Alexandru Ioan Cuza" din Iasi. All rights reserved.
|