Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques
Machine Learning (ML) has been used for a long time and has gained wide attention over the last several years. It can handle a large amount of data and allow non-linear structures by using complex mathematical computations. However, traditional ML models do suffer some problems, such as high bias an...
Published in: | Water (Switzerland) |
---|---|
Main Author: | |
Format: | Article |
Language: | English |
Published: |
MDPI
2022
|
Online Access: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-85127850257&doi=10.3390%2fw14071067&partnerID=40&md5=3c4c149d8a5dcf4ac9600018e760ff1e |
id |
Malek N.H.A.; Yaacob W.F.W.; Nasir S.A.M.; Shaadan N. |
---|---|
spelling |
Malek N.H.A.; Yaacob W.F.W.; Nasir S.A.M.; Shaadan N. 2-s2.0-85127850257 Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques 2022 Water (Switzerland) 14 7 10.3390/w14071067 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85127850257&doi=10.3390%2fw14071067&partnerID=40&md5=3c4c149d8a5dcf4ac9600018e760ff1e Machine Learning (ML) has been used for a long time and has gained wide attention over the last several years. It can handle a large amount of data and allow non-linear structures by using complex mathematical computations. However, traditional ML models do suffer some problems, such as high bias and overfitting. Therefore, this has resulted in the advancement and improvement of ML techniques, such as the bagging and boosting approach, to address these problems. This study explores a series of ML models to predict the water quality classification (WQC) in the Kelantan River using data from 2005 to 2020. The proposed methodology employed 13 physical and chemical parameters of water quality and 7 ML models that are Decision Tree, Artificial Neural Networks, K-Nearest Neighbors, Naïve Bayes, Support Vector Machine, Random Forest and Gradient Boosting. Based on the analysis, the ensemble model of Gradient Boosting with a learning rate of 0.1 exhibited the best prediction performance compared to the other algorithms. It had the highest accuracy (94.90%), sensitivity (80.00%) and f-measure (86.49%), with the lowest classification error. Total Suspended Solid (TSS) was the most significant variable for the Gradient Boosting (GB) model to predict WQC, followed by Ammoniacal Nitrogen (NH3N), Biochemical Oxygen Demand (BOD) and Chemical Oxygen Demand (COD). Based on the accurate water quality prediction, the results could help to improve the National Environmental Policy regarding water resources by continuously improving water quality. © 2022 by the authors. Licensee MDPI, Basel, Switzerland. MDPI 20734441 English Article All Open Access; Gold Open Access |
author |
2-s2.0-85127850257 |
spellingShingle |
2-s2.0-85127850257 Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques |
author_facet |
2-s2.0-85127850257 |
author_sort |
2-s2.0-85127850257 |
title |
Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques |
title_short |
Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques |
title_full |
Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques |
title_fullStr |
Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques |
title_full_unstemmed |
Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques |
title_sort |
Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques |
publishDate |
2022 |
container_title |
Water (Switzerland) |
container_volume |
14 |
container_issue |
7 |
doi_str_mv |
10.3390/w14071067 |
url |
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85127850257&doi=10.3390%2fw14071067&partnerID=40&md5=3c4c149d8a5dcf4ac9600018e760ff1e |
description |
Machine Learning (ML) has been used for a long time and has gained wide attention over the last several years. It can handle a large amount of data and allow non-linear structures by using complex mathematical computations. However, traditional ML models do suffer some problems, such as high bias and overfitting. Therefore, this has resulted in the advancement and improvement of ML techniques, such as the bagging and boosting approach, to address these problems. This study explores a series of ML models to predict the water quality classification (WQC) in the Kelantan River using data from 2005 to 2020. The proposed methodology employed 13 physical and chemical parameters of water quality and 7 ML models that are Decision Tree, Artificial Neural Networks, K-Nearest Neighbors, Naïve Bayes, Support Vector Machine, Random Forest and Gradient Boosting. Based on the analysis, the ensemble model of Gradient Boosting with a learning rate of 0.1 exhibited the best prediction performance compared to the other algorithms. It had the highest accuracy (94.90%), sensitivity (80.00%) and f-measure (86.49%), with the lowest classification error. Total Suspended Solid (TSS) was the most significant variable for the Gradient Boosting (GB) model to predict WQC, followed by Ammoniacal Nitrogen (NH3N), Biochemical Oxygen Demand (BOD) and Chemical Oxygen Demand (COD). Based on the accurate water quality prediction, the results could help to improve the National Environmental Policy regarding water resources by continuously improving water quality. © 2022 by the authors. Licensee MDPI, Basel, Switzerland. |
publisher |
MDPI |
issn |
20734441 |
language |
English |
format |
Article |
accesstype |
All Open Access; Gold Open Access |
record_format |
scopus |
collection |
Scopus |
_version_ |
1828987867949432832 |