Machine Learning Models for Predicting Flood Events Using Weather Data: An Evaluation of Logistic Regression, LightGBM, and XGBoost

This study examines flood prediction in Jakarta, Indonesia, a pressing concern due to its significant implications for public safety and urban management. Machine Learning (ML) presents promising methodologies for accurately forecasting floods by leveraging weather data. However, flood prediction in...

Full description

Bibliographic Details
Published in:Journal of Applied Data Sciences
Main Author: Maharina; Paryono T.; Fauzi A.; Indra J.; Sihabudin; Harahap M.K.; Rizki L.T.
Format: Article
Language:English
Published: Bright Publisher 2025
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85216728861&doi=10.47738%2fjads.v6i1.503&partnerID=40&md5=9e496ad42db5aec61fb9d0a9595be0a9
Description
Summary:This study examines flood prediction in Jakarta, Indonesia, a pressing concern due to its significant implications for public safety and urban management. Machine Learning (ML) presents promising methodologies for accurately forecasting floods by leveraging weather data. However, flood prediction in Jakarta remains challenging due to the city’s highly variable weather patterns, including fluctuations in rainfall, humidity, temperature, and wind characteristics. Existing methods often struggle with these complexities, as they rely on traditional ML models such as K-Nearest Neighbors (KNN), which may not capture certain patterns or provide high accuracy and robustness. Therefore, this study proposes three ML methods—Logistic Regression (LR), LightGBM, and XGBoost—to predict floods accurately. Five performance metrics (i.e., accuracy, area under the curve (AUC), precision, recall, and F1-score) were used to measure and compare the accuracy of the algorithms. The proposed method consists of three main processes. The first process involves data preprocessing and evaluation using 14 different ML models. In the second process, additional feature engineering is applied to improve the quality of the data. Finally, the third process combines the previous steps with oversampling techniques and cross-validation methods. This structured approach aims to enhance the overall performance of the analysis. The experimental results show that Process 3 significantly improves performance compared to Processes 1 and 2. The model predicts floods with an accuracy score of 93.82% for LR, 96.67% for XGBoost, and 96.81% for LightGBM, respectively. Thus, the proposed model offers a solution for operational decision-making in flood risk management, including flood mitigation planning. © 2025, Bright Publisher. All rights reserved.
ISSN:27236471
DOI:10.47738/jads.v6i1.503