Machine Learning Models for Predicting Flood Events Using Weather Data: An Evaluation of Logistic Regression, LightGBM, and XGBoost

This study examines flood prediction in Jakarta, Indonesia, a pressing concern due to its significant implications for public safety and urban management. Machine Learning (ML) presents promising methodologies for accurately forecasting floods by leveraging weather data. However, flood prediction in...

詳細記述

書誌詳細
出版年:	Journal of Applied Data Sciences
第一著者:	Maharina; Paryono T.; Fauzi A.; Indra J.; Sihabudin; Harahap M.K.; Rizki L.T.
フォーマット:	論文
言語:	English
出版事項:	Bright Publisher 2025
オンライン･アクセス:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85216728861&doi=10.47738%2fjads.v6i1.503&partnerID=40&md5=9e496ad42db5aec61fb9d0a9595be0a9

id	2-s2.0-85216728861
spelling	2-s2.0-85216728861 Maharina; Paryono T.; Fauzi A.; Indra J.; Sihabudin; Harahap M.K.; Rizki L.T. Machine Learning Models for Predicting Flood Events Using Weather Data: An Evaluation of Logistic Regression, LightGBM, and XGBoost 2025 Journal of Applied Data Sciences 6 1 10.47738/jads.v6i1.503 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85216728861&doi=10.47738%2fjads.v6i1.503&partnerID=40&md5=9e496ad42db5aec61fb9d0a9595be0a9 This study examines flood prediction in Jakarta, Indonesia, a pressing concern due to its significant implications for public safety and urban management. Machine Learning (ML) presents promising methodologies for accurately forecasting floods by leveraging weather data. However, flood prediction in Jakarta remains challenging due to the city’s highly variable weather patterns, including fluctuations in rainfall, humidity, temperature, and wind characteristics. Existing methods often struggle with these complexities, as they rely on traditional ML models such as K-Nearest Neighbors (KNN), which may not capture certain patterns or provide high accuracy and robustness. Therefore, this study proposes three ML methods—Logistic Regression (LR), LightGBM, and XGBoost—to predict floods accurately. Five performance metrics (i.e., accuracy, area under the curve (AUC), precision, recall, and F1-score) were used to measure and compare the accuracy of the algorithms. The proposed method consists of three main processes. The first process involves data preprocessing and evaluation using 14 different ML models. In the second process, additional feature engineering is applied to improve the quality of the data. Finally, the third process combines the previous steps with oversampling techniques and cross-validation methods. This structured approach aims to enhance the overall performance of the analysis. The experimental results show that Process 3 significantly improves performance compared to Processes 1 and 2. The model predicts floods with an accuracy score of 93.82% for LR, 96.67% for XGBoost, and 96.81% for LightGBM, respectively. Thus, the proposed model offers a solution for operational decision-making in flood risk management, including flood mitigation planning. © 2025, Bright Publisher. All rights reserved. Bright Publisher 27236471 English Article
author	Maharina; Paryono T.; Fauzi A.; Indra J.; Sihabudin; Harahap M.K.; Rizki L.T.
spellingShingle	Maharina; Paryono T.; Fauzi A.; Indra J.; Sihabudin; Harahap M.K.; Rizki L.T. Machine Learning Models for Predicting Flood Events Using Weather Data: An Evaluation of Logistic Regression, LightGBM, and XGBoost
author_facet	Maharina; Paryono T.; Fauzi A.; Indra J.; Sihabudin; Harahap M.K.; Rizki L.T.
author_sort	Maharina; Paryono T.; Fauzi A.; Indra J.; Sihabudin; Harahap M.K.; Rizki L.T.
title	Machine Learning Models for Predicting Flood Events Using Weather Data: An Evaluation of Logistic Regression, LightGBM, and XGBoost
title_short	Machine Learning Models for Predicting Flood Events Using Weather Data: An Evaluation of Logistic Regression, LightGBM, and XGBoost
title_full	Machine Learning Models for Predicting Flood Events Using Weather Data: An Evaluation of Logistic Regression, LightGBM, and XGBoost
title_fullStr	Machine Learning Models for Predicting Flood Events Using Weather Data: An Evaluation of Logistic Regression, LightGBM, and XGBoost
title_full_unstemmed	Machine Learning Models for Predicting Flood Events Using Weather Data: An Evaluation of Logistic Regression, LightGBM, and XGBoost
title_sort	Machine Learning Models for Predicting Flood Events Using Weather Data: An Evaluation of Logistic Regression, LightGBM, and XGBoost
publishDate	2025
container_title	Journal of Applied Data Sciences
container_volume	6
container_issue	1
doi_str_mv	10.47738/jads.v6i1.503
url	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85216728861&doi=10.47738%2fjads.v6i1.503&partnerID=40&md5=9e496ad42db5aec61fb9d0a9595be0a9
description	This study examines flood prediction in Jakarta, Indonesia, a pressing concern due to its significant implications for public safety and urban management. Machine Learning (ML) presents promising methodologies for accurately forecasting floods by leveraging weather data. However, flood prediction in Jakarta remains challenging due to the city’s highly variable weather patterns, including fluctuations in rainfall, humidity, temperature, and wind characteristics. Existing methods often struggle with these complexities, as they rely on traditional ML models such as K-Nearest Neighbors (KNN), which may not capture certain patterns or provide high accuracy and robustness. Therefore, this study proposes three ML methods—Logistic Regression (LR), LightGBM, and XGBoost—to predict floods accurately. Five performance metrics (i.e., accuracy, area under the curve (AUC), precision, recall, and F1-score) were used to measure and compare the accuracy of the algorithms. The proposed method consists of three main processes. The first process involves data preprocessing and evaluation using 14 different ML models. In the second process, additional feature engineering is applied to improve the quality of the data. Finally, the third process combines the previous steps with oversampling techniques and cross-validation methods. This structured approach aims to enhance the overall performance of the analysis. The experimental results show that Process 3 significantly improves performance compared to Processes 1 and 2. The model predicts floods with an accuracy score of 93.82% for LR, 96.67% for XGBoost, and 96.81% for LightGBM, respectively. Thus, the proposed model offers a solution for operational decision-making in flood risk management, including flood mitigation planning. © 2025, Bright Publisher. All rights reserved.
publisher	Bright Publisher
issn	27236471
language	English
format	Article
accesstype
record_format	scopus
collection	Scopus
_version_	1825722576124510208

Machine Learning Models for Predicting Flood Events Using Weather Data: An Evaluation of Logistic Regression, LightGBM, and XGBoost

類似資料