PREDICTION OF WATER QUALITY FOR THE SELANGOR RIVERS USING DATA MINING APPROACH

There are few studies using the data mining approach to assess the quality of water, especially for Selangor rivers. This study assesses the water quality using data mining techniques and identified the most significant variables that affect water quality. Machine learning techniques used are Decisi...

Full description

Bibliographic Details
Published in:Journal of Sustainability Science and Management
Main Author: Ibrahim N.; Rahman H.A.A.; Azran A.A.; Faddillah M.A.M.; Qamarudin M.A.Q.M.
Format: Article
Language:English
Published: Universiti Malaysia Terengganu 2023
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85172922072&doi=10.46754%2fjssm.2023.09.0012&partnerID=40&md5=399c58c33b51a7893e46becc1ee7878f
id 2-s2.0-85172922072
spelling 2-s2.0-85172922072
Ibrahim N.; Rahman H.A.A.; Azran A.A.; Faddillah M.A.M.; Qamarudin M.A.Q.M.
PREDICTION OF WATER QUALITY FOR THE SELANGOR RIVERS USING DATA MINING APPROACH
2023
Journal of Sustainability Science and Management
18
9
10.46754/jssm.2023.09.0012
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85172922072&doi=10.46754%2fjssm.2023.09.0012&partnerID=40&md5=399c58c33b51a7893e46becc1ee7878f
There are few studies using the data mining approach to assess the quality of water, especially for Selangor rivers. This study assesses the water quality using data mining techniques and identified the most significant variables that affect water quality. Machine learning techniques used are Decision Tree (Gini) and Decision Tree (Entropy), Logistic Regression Enter, Backward Elimination and Forward Selection and Artificial Neural Network with 4 and 8 hidden nodes. This study revealed that Logistic Regression Enter is the best model since it is neither underfit nor overfit with the sensitivity, specificity, accuracy, mean squared error and misclassification rate values of 92.51%, 97.45%, 96.36%, 0.028 and 3.64% respectively. There are other two best models: Decision Tree (Gini) and Artificial Neural Network with 4 hidden nodes. According to the variable importance output based on Decision Tree (Gini), the most important variable effect on the water quality is Biochemical Oxygen Demand (BOD) with the highest value of 0.2284, followed by Chemical Oxygen Demand with a value 0.1471 respectively. © Penerbit UMT
Universiti Malaysia Terengganu
18238556
English
Article
All Open Access; Bronze Open Access
author Ibrahim N.; Rahman H.A.A.; Azran A.A.; Faddillah M.A.M.; Qamarudin M.A.Q.M.
spellingShingle Ibrahim N.; Rahman H.A.A.; Azran A.A.; Faddillah M.A.M.; Qamarudin M.A.Q.M.
PREDICTION OF WATER QUALITY FOR THE SELANGOR RIVERS USING DATA MINING APPROACH
author_facet Ibrahim N.; Rahman H.A.A.; Azran A.A.; Faddillah M.A.M.; Qamarudin M.A.Q.M.
author_sort Ibrahim N.; Rahman H.A.A.; Azran A.A.; Faddillah M.A.M.; Qamarudin M.A.Q.M.
title PREDICTION OF WATER QUALITY FOR THE SELANGOR RIVERS USING DATA MINING APPROACH
title_short PREDICTION OF WATER QUALITY FOR THE SELANGOR RIVERS USING DATA MINING APPROACH
title_full PREDICTION OF WATER QUALITY FOR THE SELANGOR RIVERS USING DATA MINING APPROACH
title_fullStr PREDICTION OF WATER QUALITY FOR THE SELANGOR RIVERS USING DATA MINING APPROACH
title_full_unstemmed PREDICTION OF WATER QUALITY FOR THE SELANGOR RIVERS USING DATA MINING APPROACH
title_sort PREDICTION OF WATER QUALITY FOR THE SELANGOR RIVERS USING DATA MINING APPROACH
publishDate 2023
container_title Journal of Sustainability Science and Management
container_volume 18
container_issue 9
doi_str_mv 10.46754/jssm.2023.09.0012
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85172922072&doi=10.46754%2fjssm.2023.09.0012&partnerID=40&md5=399c58c33b51a7893e46becc1ee7878f
description There are few studies using the data mining approach to assess the quality of water, especially for Selangor rivers. This study assesses the water quality using data mining techniques and identified the most significant variables that affect water quality. Machine learning techniques used are Decision Tree (Gini) and Decision Tree (Entropy), Logistic Regression Enter, Backward Elimination and Forward Selection and Artificial Neural Network with 4 and 8 hidden nodes. This study revealed that Logistic Regression Enter is the best model since it is neither underfit nor overfit with the sensitivity, specificity, accuracy, mean squared error and misclassification rate values of 92.51%, 97.45%, 96.36%, 0.028 and 3.64% respectively. There are other two best models: Decision Tree (Gini) and Artificial Neural Network with 4 hidden nodes. According to the variable importance output based on Decision Tree (Gini), the most important variable effect on the water quality is Biochemical Oxygen Demand (BOD) with the highest value of 0.2284, followed by Chemical Oxygen Demand with a value 0.1471 respectively. © Penerbit UMT
publisher Universiti Malaysia Terengganu
issn 18238556
language English
format Article
accesstype All Open Access; Bronze Open Access
record_format scopus
collection Scopus
_version_ 1820775446876258304