PREDICTION OF WATER QUALITY FOR THE SELANGOR RIVERS USING DATA MINING APPROACH
There are few studies using the data mining approach to assess the quality of water, especially for Selangor rivers. This study assesses the water quality using data mining techniques and identified the most significant variables that affect water quality. Machine learning techniques used are Decisi...
Published in: | Journal of Sustainability Science and Management |
---|---|
Main Author: | |
Format: | Article |
Language: | English |
Published: |
Universiti Malaysia Terengganu
2023
|
Online Access: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-85172922072&doi=10.46754%2fjssm.2023.09.0012&partnerID=40&md5=399c58c33b51a7893e46becc1ee7878f |
id |
2-s2.0-85172922072 |
---|---|
spelling |
2-s2.0-85172922072 Ibrahim N.; Rahman H.A.A.; Azran A.A.; Faddillah M.A.M.; Qamarudin M.A.Q.M. PREDICTION OF WATER QUALITY FOR THE SELANGOR RIVERS USING DATA MINING APPROACH 2023 Journal of Sustainability Science and Management 18 9 10.46754/jssm.2023.09.0012 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85172922072&doi=10.46754%2fjssm.2023.09.0012&partnerID=40&md5=399c58c33b51a7893e46becc1ee7878f There are few studies using the data mining approach to assess the quality of water, especially for Selangor rivers. This study assesses the water quality using data mining techniques and identified the most significant variables that affect water quality. Machine learning techniques used are Decision Tree (Gini) and Decision Tree (Entropy), Logistic Regression Enter, Backward Elimination and Forward Selection and Artificial Neural Network with 4 and 8 hidden nodes. This study revealed that Logistic Regression Enter is the best model since it is neither underfit nor overfit with the sensitivity, specificity, accuracy, mean squared error and misclassification rate values of 92.51%, 97.45%, 96.36%, 0.028 and 3.64% respectively. There are other two best models: Decision Tree (Gini) and Artificial Neural Network with 4 hidden nodes. According to the variable importance output based on Decision Tree (Gini), the most important variable effect on the water quality is Biochemical Oxygen Demand (BOD) with the highest value of 0.2284, followed by Chemical Oxygen Demand with a value 0.1471 respectively. © Penerbit UMT Universiti Malaysia Terengganu 18238556 English Article All Open Access; Bronze Open Access |
author |
Ibrahim N.; Rahman H.A.A.; Azran A.A.; Faddillah M.A.M.; Qamarudin M.A.Q.M. |
spellingShingle |
Ibrahim N.; Rahman H.A.A.; Azran A.A.; Faddillah M.A.M.; Qamarudin M.A.Q.M. PREDICTION OF WATER QUALITY FOR THE SELANGOR RIVERS USING DATA MINING APPROACH |
author_facet |
Ibrahim N.; Rahman H.A.A.; Azran A.A.; Faddillah M.A.M.; Qamarudin M.A.Q.M. |
author_sort |
Ibrahim N.; Rahman H.A.A.; Azran A.A.; Faddillah M.A.M.; Qamarudin M.A.Q.M. |
title |
PREDICTION OF WATER QUALITY FOR THE SELANGOR RIVERS USING DATA MINING APPROACH |
title_short |
PREDICTION OF WATER QUALITY FOR THE SELANGOR RIVERS USING DATA MINING APPROACH |
title_full |
PREDICTION OF WATER QUALITY FOR THE SELANGOR RIVERS USING DATA MINING APPROACH |
title_fullStr |
PREDICTION OF WATER QUALITY FOR THE SELANGOR RIVERS USING DATA MINING APPROACH |
title_full_unstemmed |
PREDICTION OF WATER QUALITY FOR THE SELANGOR RIVERS USING DATA MINING APPROACH |
title_sort |
PREDICTION OF WATER QUALITY FOR THE SELANGOR RIVERS USING DATA MINING APPROACH |
publishDate |
2023 |
container_title |
Journal of Sustainability Science and Management |
container_volume |
18 |
container_issue |
9 |
doi_str_mv |
10.46754/jssm.2023.09.0012 |
url |
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85172922072&doi=10.46754%2fjssm.2023.09.0012&partnerID=40&md5=399c58c33b51a7893e46becc1ee7878f |
description |
There are few studies using the data mining approach to assess the quality of water, especially for Selangor rivers. This study assesses the water quality using data mining techniques and identified the most significant variables that affect water quality. Machine learning techniques used are Decision Tree (Gini) and Decision Tree (Entropy), Logistic Regression Enter, Backward Elimination and Forward Selection and Artificial Neural Network with 4 and 8 hidden nodes. This study revealed that Logistic Regression Enter is the best model since it is neither underfit nor overfit with the sensitivity, specificity, accuracy, mean squared error and misclassification rate values of 92.51%, 97.45%, 96.36%, 0.028 and 3.64% respectively. There are other two best models: Decision Tree (Gini) and Artificial Neural Network with 4 hidden nodes. According to the variable importance output based on Decision Tree (Gini), the most important variable effect on the water quality is Biochemical Oxygen Demand (BOD) with the highest value of 0.2284, followed by Chemical Oxygen Demand with a value 0.1471 respectively. © Penerbit UMT |
publisher |
Universiti Malaysia Terengganu |
issn |
18238556 |
language |
English |
format |
Article |
accesstype |
All Open Access; Bronze Open Access |
record_format |
scopus |
collection |
Scopus |
_version_ |
1820775446876258304 |