Reduced Noise SMOTE in Machine Learning Model: Application in Water Quality Classification with Imbalanced Datasets
Achieving accurate classification in imbalanced datasets, especially for environmental data such as water quality assessment, is a major challenge for machine learning classifiers. This study introduces the Reduced Noise-Synthetic Minority Oversampling Technique (RN-SMOTE) to address the problems of...
Published in: | 2024 5th International Conference on Artificial Intelligence and Data Sciences, AiDAS 2024 - Proceedings |
---|---|
Main Author: | |
Format: | Conference paper |
Language: | English |
Published: |
Institute of Electrical and Electronics Engineers Inc.
2024
|
Online Access: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-85209693380&doi=10.1109%2fAiDAS63860.2024.10730391&partnerID=40&md5=2676def56948cc6cbe10c9ca564deb1c |
id |
2-s2.0-85209693380 |
---|---|
spelling |
2-s2.0-85209693380 Nasaruddin N.; Masseran N.; Idris W.M.R.; Ul-Saufie A.Z. Reduced Noise SMOTE in Machine Learning Model: Application in Water Quality Classification with Imbalanced Datasets 2024 2024 5th International Conference on Artificial Intelligence and Data Sciences, AiDAS 2024 - Proceedings 10.1109/AiDAS63860.2024.10730391 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85209693380&doi=10.1109%2fAiDAS63860.2024.10730391&partnerID=40&md5=2676def56948cc6cbe10c9ca564deb1c Achieving accurate classification in imbalanced datasets, especially for environmental data such as water quality assessment, is a major challenge for machine learning classifiers. This study introduces the Reduced Noise-Synthetic Minority Oversampling Technique (RN-SMOTE) to address the problems of underrepresentation of minority classes and noise in imbalanced multiclass datasets for Water Quality Classification (WQC). Current state-of-the-art techniques, such as the standard Synthetic Minority Oversampling Technique (SMOTE) and its variants, improve class balance but often need to adequately address noise in synthetic samples. Our research extends RN-SMOTE, previously applied to binary data, to multiclass scenarios. RN-SMOTE improves classification performance by oversampling the minority class and eliminating noisy synthetic instances. We evaluate the effectiveness of RN-SMOTE using three classifiers: Decision Tree (DT), Random Forest (RF), and Extreme Gradient Boosting (XGBoost). The experimental results reveal that RN-SMOTE significantly improves classification accuracy and sensitivity. For instance, the RF classifier with RN-SMOTE achieved an accuracy of 71.17% and a sensitivity of 75.24%, 69.23% and 72.14% for the clean, slightly polluted and polluted classes, respectively, outperforming the original dataset and traditional SMOTE techniques. However, RN-SMOTE did not outperform the traditional SMOTE for DT and the XGBoost model. Applying RN-SMOTE to multiclass water quality data extends its utility and advances unbalanced classification in environmental science. © 2024 IEEE. Institute of Electrical and Electronics Engineers Inc. English Conference paper |
author |
Nasaruddin N.; Masseran N.; Idris W.M.R.; Ul-Saufie A.Z. |
spellingShingle |
Nasaruddin N.; Masseran N.; Idris W.M.R.; Ul-Saufie A.Z. Reduced Noise SMOTE in Machine Learning Model: Application in Water Quality Classification with Imbalanced Datasets |
author_facet |
Nasaruddin N.; Masseran N.; Idris W.M.R.; Ul-Saufie A.Z. |
author_sort |
Nasaruddin N.; Masseran N.; Idris W.M.R.; Ul-Saufie A.Z. |
title |
Reduced Noise SMOTE in Machine Learning Model: Application in Water Quality Classification with Imbalanced Datasets |
title_short |
Reduced Noise SMOTE in Machine Learning Model: Application in Water Quality Classification with Imbalanced Datasets |
title_full |
Reduced Noise SMOTE in Machine Learning Model: Application in Water Quality Classification with Imbalanced Datasets |
title_fullStr |
Reduced Noise SMOTE in Machine Learning Model: Application in Water Quality Classification with Imbalanced Datasets |
title_full_unstemmed |
Reduced Noise SMOTE in Machine Learning Model: Application in Water Quality Classification with Imbalanced Datasets |
title_sort |
Reduced Noise SMOTE in Machine Learning Model: Application in Water Quality Classification with Imbalanced Datasets |
publishDate |
2024 |
container_title |
2024 5th International Conference on Artificial Intelligence and Data Sciences, AiDAS 2024 - Proceedings |
container_volume |
|
container_issue |
|
doi_str_mv |
10.1109/AiDAS63860.2024.10730391 |
url |
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85209693380&doi=10.1109%2fAiDAS63860.2024.10730391&partnerID=40&md5=2676def56948cc6cbe10c9ca564deb1c |
description |
Achieving accurate classification in imbalanced datasets, especially for environmental data such as water quality assessment, is a major challenge for machine learning classifiers. This study introduces the Reduced Noise-Synthetic Minority Oversampling Technique (RN-SMOTE) to address the problems of underrepresentation of minority classes and noise in imbalanced multiclass datasets for Water Quality Classification (WQC). Current state-of-the-art techniques, such as the standard Synthetic Minority Oversampling Technique (SMOTE) and its variants, improve class balance but often need to adequately address noise in synthetic samples. Our research extends RN-SMOTE, previously applied to binary data, to multiclass scenarios. RN-SMOTE improves classification performance by oversampling the minority class and eliminating noisy synthetic instances. We evaluate the effectiveness of RN-SMOTE using three classifiers: Decision Tree (DT), Random Forest (RF), and Extreme Gradient Boosting (XGBoost). The experimental results reveal that RN-SMOTE significantly improves classification accuracy and sensitivity. For instance, the RF classifier with RN-SMOTE achieved an accuracy of 71.17% and a sensitivity of 75.24%, 69.23% and 72.14% for the clean, slightly polluted and polluted classes, respectively, outperforming the original dataset and traditional SMOTE techniques. However, RN-SMOTE did not outperform the traditional SMOTE for DT and the XGBoost model. Applying RN-SMOTE to multiclass water quality data extends its utility and advances unbalanced classification in environmental science. © 2024 IEEE. |
publisher |
Institute of Electrical and Electronics Engineers Inc. |
issn |
|
language |
English |
format |
Conference paper |
accesstype |
|
record_format |
scopus |
collection |
Scopus |
_version_ |
1818940553964814336 |