Machine Learning Models for Predicting Flood Events Using Weather Data: An Evaluation of Logistic Regression, LightGBM, and XGBoost

This study examines flood prediction in Jakarta, Indonesia, a pressing concern due to its significant implications for public safety and urban management. Machine Learning (ML) presents promising methodologies for accurately forecasting floods by leveraging weather data. However, flood prediction in...

Full description

Bibliographic Details
Published in:Journal of Applied Data Sciences
Main Author: Maharina; Paryono T.; Fauzi A.; Indra J.; Sihabudin; Harahap M.K.; Rizki L.T.
Format: Article
Language:English
Published: Bright Publisher 2025
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85216728861&doi=10.47738%2fjads.v6i1.503&partnerID=40&md5=9e496ad42db5aec61fb9d0a9595be0a9
id 2-s2.0-85216728861
spelling 2-s2.0-85216728861
Maharina; Paryono T.; Fauzi A.; Indra J.; Sihabudin; Harahap M.K.; Rizki L.T.
Machine Learning Models for Predicting Flood Events Using Weather Data: An Evaluation of Logistic Regression, LightGBM, and XGBoost
2025
Journal of Applied Data Sciences
6
1
10.47738/jads.v6i1.503
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85216728861&doi=10.47738%2fjads.v6i1.503&partnerID=40&md5=9e496ad42db5aec61fb9d0a9595be0a9
This study examines flood prediction in Jakarta, Indonesia, a pressing concern due to its significant implications for public safety and urban management. Machine Learning (ML) presents promising methodologies for accurately forecasting floods by leveraging weather data. However, flood prediction in Jakarta remains challenging due to the city’s highly variable weather patterns, including fluctuations in rainfall, humidity, temperature, and wind characteristics. Existing methods often struggle with these complexities, as they rely on traditional ML models such as K-Nearest Neighbors (KNN), which may not capture certain patterns or provide high accuracy and robustness. Therefore, this study proposes three ML methods—Logistic Regression (LR), LightGBM, and XGBoost—to predict floods accurately. Five performance metrics (i.e., accuracy, area under the curve (AUC), precision, recall, and F1-score) were used to measure and compare the accuracy of the algorithms. The proposed method consists of three main processes. The first process involves data preprocessing and evaluation using 14 different ML models. In the second process, additional feature engineering is applied to improve the quality of the data. Finally, the third process combines the previous steps with oversampling techniques and cross-validation methods. This structured approach aims to enhance the overall performance of the analysis. The experimental results show that Process 3 significantly improves performance compared to Processes 1 and 2. The model predicts floods with an accuracy score of 93.82% for LR, 96.67% for XGBoost, and 96.81% for LightGBM, respectively. Thus, the proposed model offers a solution for operational decision-making in flood risk management, including flood mitigation planning. © 2025, Bright Publisher. All rights reserved.
Bright Publisher
27236471
English
Article

author Maharina; Paryono T.; Fauzi A.; Indra J.; Sihabudin; Harahap M.K.; Rizki L.T.
spellingShingle Maharina; Paryono T.; Fauzi A.; Indra J.; Sihabudin; Harahap M.K.; Rizki L.T.
Machine Learning Models for Predicting Flood Events Using Weather Data: An Evaluation of Logistic Regression, LightGBM, and XGBoost
author_facet Maharina; Paryono T.; Fauzi A.; Indra J.; Sihabudin; Harahap M.K.; Rizki L.T.
author_sort Maharina; Paryono T.; Fauzi A.; Indra J.; Sihabudin; Harahap M.K.; Rizki L.T.
title Machine Learning Models for Predicting Flood Events Using Weather Data: An Evaluation of Logistic Regression, LightGBM, and XGBoost
title_short Machine Learning Models for Predicting Flood Events Using Weather Data: An Evaluation of Logistic Regression, LightGBM, and XGBoost
title_full Machine Learning Models for Predicting Flood Events Using Weather Data: An Evaluation of Logistic Regression, LightGBM, and XGBoost
title_fullStr Machine Learning Models for Predicting Flood Events Using Weather Data: An Evaluation of Logistic Regression, LightGBM, and XGBoost
title_full_unstemmed Machine Learning Models for Predicting Flood Events Using Weather Data: An Evaluation of Logistic Regression, LightGBM, and XGBoost
title_sort Machine Learning Models for Predicting Flood Events Using Weather Data: An Evaluation of Logistic Regression, LightGBM, and XGBoost
publishDate 2025
container_title Journal of Applied Data Sciences
container_volume 6
container_issue 1
doi_str_mv 10.47738/jads.v6i1.503
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85216728861&doi=10.47738%2fjads.v6i1.503&partnerID=40&md5=9e496ad42db5aec61fb9d0a9595be0a9
description This study examines flood prediction in Jakarta, Indonesia, a pressing concern due to its significant implications for public safety and urban management. Machine Learning (ML) presents promising methodologies for accurately forecasting floods by leveraging weather data. However, flood prediction in Jakarta remains challenging due to the city’s highly variable weather patterns, including fluctuations in rainfall, humidity, temperature, and wind characteristics. Existing methods often struggle with these complexities, as they rely on traditional ML models such as K-Nearest Neighbors (KNN), which may not capture certain patterns or provide high accuracy and robustness. Therefore, this study proposes three ML methods—Logistic Regression (LR), LightGBM, and XGBoost—to predict floods accurately. Five performance metrics (i.e., accuracy, area under the curve (AUC), precision, recall, and F1-score) were used to measure and compare the accuracy of the algorithms. The proposed method consists of three main processes. The first process involves data preprocessing and evaluation using 14 different ML models. In the second process, additional feature engineering is applied to improve the quality of the data. Finally, the third process combines the previous steps with oversampling techniques and cross-validation methods. This structured approach aims to enhance the overall performance of the analysis. The experimental results show that Process 3 significantly improves performance compared to Processes 1 and 2. The model predicts floods with an accuracy score of 93.82% for LR, 96.67% for XGBoost, and 96.81% for LightGBM, respectively. Thus, the proposed model offers a solution for operational decision-making in flood risk management, including flood mitigation planning. © 2025, Bright Publisher. All rights reserved.
publisher Bright Publisher
issn 27236471
language English
format Article
accesstype
record_format scopus
collection Scopus
_version_ 1825722576124510208