Reduced Noise SMOTE in Machine Learning Model: Application in Water Quality Classification with Imbalanced Datasets

Achieving accurate classification in imbalanced datasets, especially for environmental data such as water quality assessment, is a major challenge for machine learning classifiers. This study introduces the Reduced Noise-Synthetic Minority Oversampling Technique (RN-SMOTE) to address the problems of...

Full description

Bibliographic Details
Published in:2024 5th International Conference on Artificial Intelligence and Data Sciences, AiDAS 2024 - Proceedings
Main Author: Nasaruddin N.; Masseran N.; Idris W.M.R.; Ul-Saufie A.Z.
Format: Conference paper
Language:English
Published: Institute of Electrical and Electronics Engineers Inc. 2024
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85209693380&doi=10.1109%2fAiDAS63860.2024.10730391&partnerID=40&md5=2676def56948cc6cbe10c9ca564deb1c
id 2-s2.0-85209693380
spelling 2-s2.0-85209693380
Nasaruddin N.; Masseran N.; Idris W.M.R.; Ul-Saufie A.Z.
Reduced Noise SMOTE in Machine Learning Model: Application in Water Quality Classification with Imbalanced Datasets
2024
2024 5th International Conference on Artificial Intelligence and Data Sciences, AiDAS 2024 - Proceedings


10.1109/AiDAS63860.2024.10730391
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85209693380&doi=10.1109%2fAiDAS63860.2024.10730391&partnerID=40&md5=2676def56948cc6cbe10c9ca564deb1c
Achieving accurate classification in imbalanced datasets, especially for environmental data such as water quality assessment, is a major challenge for machine learning classifiers. This study introduces the Reduced Noise-Synthetic Minority Oversampling Technique (RN-SMOTE) to address the problems of underrepresentation of minority classes and noise in imbalanced multiclass datasets for Water Quality Classification (WQC). Current state-of-the-art techniques, such as the standard Synthetic Minority Oversampling Technique (SMOTE) and its variants, improve class balance but often need to adequately address noise in synthetic samples. Our research extends RN-SMOTE, previously applied to binary data, to multiclass scenarios. RN-SMOTE improves classification performance by oversampling the minority class and eliminating noisy synthetic instances. We evaluate the effectiveness of RN-SMOTE using three classifiers: Decision Tree (DT), Random Forest (RF), and Extreme Gradient Boosting (XGBoost). The experimental results reveal that RN-SMOTE significantly improves classification accuracy and sensitivity. For instance, the RF classifier with RN-SMOTE achieved an accuracy of 71.17% and a sensitivity of 75.24%, 69.23% and 72.14% for the clean, slightly polluted and polluted classes, respectively, outperforming the original dataset and traditional SMOTE techniques. However, RN-SMOTE did not outperform the traditional SMOTE for DT and the XGBoost model. Applying RN-SMOTE to multiclass water quality data extends its utility and advances unbalanced classification in environmental science. © 2024 IEEE.
Institute of Electrical and Electronics Engineers Inc.

English
Conference paper

author Nasaruddin N.; Masseran N.; Idris W.M.R.; Ul-Saufie A.Z.
spellingShingle Nasaruddin N.; Masseran N.; Idris W.M.R.; Ul-Saufie A.Z.
Reduced Noise SMOTE in Machine Learning Model: Application in Water Quality Classification with Imbalanced Datasets
author_facet Nasaruddin N.; Masseran N.; Idris W.M.R.; Ul-Saufie A.Z.
author_sort Nasaruddin N.; Masseran N.; Idris W.M.R.; Ul-Saufie A.Z.
title Reduced Noise SMOTE in Machine Learning Model: Application in Water Quality Classification with Imbalanced Datasets
title_short Reduced Noise SMOTE in Machine Learning Model: Application in Water Quality Classification with Imbalanced Datasets
title_full Reduced Noise SMOTE in Machine Learning Model: Application in Water Quality Classification with Imbalanced Datasets
title_fullStr Reduced Noise SMOTE in Machine Learning Model: Application in Water Quality Classification with Imbalanced Datasets
title_full_unstemmed Reduced Noise SMOTE in Machine Learning Model: Application in Water Quality Classification with Imbalanced Datasets
title_sort Reduced Noise SMOTE in Machine Learning Model: Application in Water Quality Classification with Imbalanced Datasets
publishDate 2024
container_title 2024 5th International Conference on Artificial Intelligence and Data Sciences, AiDAS 2024 - Proceedings
container_volume
container_issue
doi_str_mv 10.1109/AiDAS63860.2024.10730391
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85209693380&doi=10.1109%2fAiDAS63860.2024.10730391&partnerID=40&md5=2676def56948cc6cbe10c9ca564deb1c
description Achieving accurate classification in imbalanced datasets, especially for environmental data such as water quality assessment, is a major challenge for machine learning classifiers. This study introduces the Reduced Noise-Synthetic Minority Oversampling Technique (RN-SMOTE) to address the problems of underrepresentation of minority classes and noise in imbalanced multiclass datasets for Water Quality Classification (WQC). Current state-of-the-art techniques, such as the standard Synthetic Minority Oversampling Technique (SMOTE) and its variants, improve class balance but often need to adequately address noise in synthetic samples. Our research extends RN-SMOTE, previously applied to binary data, to multiclass scenarios. RN-SMOTE improves classification performance by oversampling the minority class and eliminating noisy synthetic instances. We evaluate the effectiveness of RN-SMOTE using three classifiers: Decision Tree (DT), Random Forest (RF), and Extreme Gradient Boosting (XGBoost). The experimental results reveal that RN-SMOTE significantly improves classification accuracy and sensitivity. For instance, the RF classifier with RN-SMOTE achieved an accuracy of 71.17% and a sensitivity of 75.24%, 69.23% and 72.14% for the clean, slightly polluted and polluted classes, respectively, outperforming the original dataset and traditional SMOTE techniques. However, RN-SMOTE did not outperform the traditional SMOTE for DT and the XGBoost model. Applying RN-SMOTE to multiclass water quality data extends its utility and advances unbalanced classification in environmental science. © 2024 IEEE.
publisher Institute of Electrical and Electronics Engineers Inc.
issn
language English
format Conference paper
accesstype
record_format scopus
collection Scopus
_version_ 1818940553964814336