Improving Air Quality Prediction Models for Banting: A Performance Evaluation of Lasso, mRMR, and ReliefF

This study explores the effectiveness of various feature selection methods in forecasting next-day PM2.5 levels in Banting, Malaysia. The accurate prediction of PM2.5 concentrations is crucial for public health, enabling authorities to take timely actions to mitigate exposure to harmful pollutants....

詳細記述

書誌詳細
出版年:INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS
主要な著者: Arafin, Siti Khadijah; Mazumdar, Suvodeep; Ibrahim, Nurain
フォーマット: 論文
言語:English
出版事項: SCIENCE & INFORMATION SAI ORGANIZATION LTD 2025
主題:
オンライン・アクセス:https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001441763100001
author Arafin
Siti Khadijah; Mazumdar
Suvodeep; Ibrahim
Nurain
spellingShingle Arafin
Siti Khadijah; Mazumdar
Suvodeep; Ibrahim
Nurain
Improving Air Quality Prediction Models for Banting: A Performance Evaluation of Lasso, mRMR, and ReliefF
Computer Science
author_facet Arafin
Siti Khadijah; Mazumdar
Suvodeep; Ibrahim
Nurain
author_sort Arafin
spelling Arafin, Siti Khadijah; Mazumdar, Suvodeep; Ibrahim, Nurain
Improving Air Quality Prediction Models for Banting: A Performance Evaluation of Lasso, mRMR, and ReliefF
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS
English
Article
This study explores the effectiveness of various feature selection methods in forecasting next-day PM2.5 levels in Banting, Malaysia. The accurate prediction of PM2.5 concentrations is crucial for public health, enabling authorities to take timely actions to mitigate exposure to harmful pollutants. This study compares three feature selection methods: Lasso, mRMR, and ReliefF using a dataset consisting of 43,824 data points collected from Banting air quality monitoring stations (CA22B). The dataset includes ten variables, including pollutant concentrations such as O3, CO, NO2, SO2, PM10, and PM2.5, along with meteorological parameters such as temperature, humidity, wind direction and wind speed. The results revealed that Lasso outperformed both mRMR and ReliefF in terms of various performance metrics, including accuracy, sensitivity, precision, F1 score, and AUROC. Lasso demonstrated superior ability to handle multicollinearity, significantly improving the interpretability of the model by retaining only the most important variables. This suggests that the effectiveness of feature selection methods is highly dependent on the characteristics of the dataset, such as correlations among features. Thus, the top eight features to predict PM2.5 levels in Banting selected by Lasso method are relative humidity, PM2.5, wind direction, ambient temperature, PM10, NO2, wind speed, and O3. The findings from this study contribute to the growing body of knowledge on air quality prediction models, highlighting the importance of selecting the appropriate feature selection method to achieve the best model performance. Future research should explore the application of Lasso method in other geographical regions, including urban, suburban and rural areas, to assess the generalizability of the results.
SCIENCE & INFORMATION SAI ORGANIZATION LTD
2158-107X
2156-5570
2025
16
2

Computer Science

WOS:001441763100001
https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001441763100001
title Improving Air Quality Prediction Models for Banting: A Performance Evaluation of Lasso, mRMR, and ReliefF
title_short Improving Air Quality Prediction Models for Banting: A Performance Evaluation of Lasso, mRMR, and ReliefF
title_full Improving Air Quality Prediction Models for Banting: A Performance Evaluation of Lasso, mRMR, and ReliefF
title_fullStr Improving Air Quality Prediction Models for Banting: A Performance Evaluation of Lasso, mRMR, and ReliefF
title_full_unstemmed Improving Air Quality Prediction Models for Banting: A Performance Evaluation of Lasso, mRMR, and ReliefF
title_sort Improving Air Quality Prediction Models for Banting: A Performance Evaluation of Lasso, mRMR, and ReliefF
container_title INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS
language English
format Article
description This study explores the effectiveness of various feature selection methods in forecasting next-day PM2.5 levels in Banting, Malaysia. The accurate prediction of PM2.5 concentrations is crucial for public health, enabling authorities to take timely actions to mitigate exposure to harmful pollutants. This study compares three feature selection methods: Lasso, mRMR, and ReliefF using a dataset consisting of 43,824 data points collected from Banting air quality monitoring stations (CA22B). The dataset includes ten variables, including pollutant concentrations such as O3, CO, NO2, SO2, PM10, and PM2.5, along with meteorological parameters such as temperature, humidity, wind direction and wind speed. The results revealed that Lasso outperformed both mRMR and ReliefF in terms of various performance metrics, including accuracy, sensitivity, precision, F1 score, and AUROC. Lasso demonstrated superior ability to handle multicollinearity, significantly improving the interpretability of the model by retaining only the most important variables. This suggests that the effectiveness of feature selection methods is highly dependent on the characteristics of the dataset, such as correlations among features. Thus, the top eight features to predict PM2.5 levels in Banting selected by Lasso method are relative humidity, PM2.5, wind direction, ambient temperature, PM10, NO2, wind speed, and O3. The findings from this study contribute to the growing body of knowledge on air quality prediction models, highlighting the importance of selecting the appropriate feature selection method to achieve the best model performance. Future research should explore the application of Lasso method in other geographical regions, including urban, suburban and rural areas, to assess the generalizability of the results.
publisher SCIENCE & INFORMATION SAI ORGANIZATION LTD
issn 2158-107X
2156-5570
publishDate 2025
container_volume 16
container_issue 2
doi_str_mv
topic Computer Science
topic_facet Computer Science
accesstype
id WOS:001441763100001
url https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001441763100001
record_format wos
collection Web of Science (WoS)
_version_ 1828987784669429760