Predicting dengue transmission rates by comparing different machine learning models with vector indices and meteorological data

Machine learning algorithms (ML) are receiving a lot of attention in the development of predictive models for monitoring dengue transmission rates. Previous work has focused only on specific weather variables and algorithms, and there is still a need for a model that uses more variables and algorith...

Full description

Bibliographic Details
Published in:Scientific Reports
Main Author: Ong S.Q.; Isawasan P.; Ngesom A.M.M.; Shahar H.; Lasim A.M.; Nair G.
Format: Article
Language:English
Published: Nature Research 2023
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85175876072&doi=10.1038%2fs41598-023-46342-2&partnerID=40&md5=3beb9c18199661d6b555b8af98ee5fa7
id 2-s2.0-85175876072
spelling 2-s2.0-85175876072
Ong S.Q.; Isawasan P.; Ngesom A.M.M.; Shahar H.; Lasim A.M.; Nair G.
Predicting dengue transmission rates by comparing different machine learning models with vector indices and meteorological data
2023
Scientific Reports
13
1
10.1038/s41598-023-46342-2
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85175876072&doi=10.1038%2fs41598-023-46342-2&partnerID=40&md5=3beb9c18199661d6b555b8af98ee5fa7
Machine learning algorithms (ML) are receiving a lot of attention in the development of predictive models for monitoring dengue transmission rates. Previous work has focused only on specific weather variables and algorithms, and there is still a need for a model that uses more variables and algorithms that have higher performance. In this study, we use vector indices and meteorological data as predictors to develop the ML models. We trained and validated seven ML algorithms, including an ensemble ML method, and compared their performance using the receiver operating characteristic (ROC) with the area under the curve (AUC), accuracy and F1 score. Our results show that an ensemble ML such as XG Boost, AdaBoost and Random Forest perform better than the logistics regression, Naïve Bayens, decision tree, and support vector machine (SVM), with XGBoost having the highest AUC, accuracy and F1 score. Analysis of the importance of the variables showed that the container index was the least important. By removing this variable, the ML models improved their performance by at least 6% in AUC and F1 score. Our result provides a framework for future studies on the use of predictive models in the development of an early warning system. © 2023, The Author(s).
Nature Research
20452322
English
Article
All Open Access; Gold Open Access; Green Open Access
author Ong S.Q.; Isawasan P.; Ngesom A.M.M.; Shahar H.; Lasim A.M.; Nair G.
spellingShingle Ong S.Q.; Isawasan P.; Ngesom A.M.M.; Shahar H.; Lasim A.M.; Nair G.
Predicting dengue transmission rates by comparing different machine learning models with vector indices and meteorological data
author_facet Ong S.Q.; Isawasan P.; Ngesom A.M.M.; Shahar H.; Lasim A.M.; Nair G.
author_sort Ong S.Q.; Isawasan P.; Ngesom A.M.M.; Shahar H.; Lasim A.M.; Nair G.
title Predicting dengue transmission rates by comparing different machine learning models with vector indices and meteorological data
title_short Predicting dengue transmission rates by comparing different machine learning models with vector indices and meteorological data
title_full Predicting dengue transmission rates by comparing different machine learning models with vector indices and meteorological data
title_fullStr Predicting dengue transmission rates by comparing different machine learning models with vector indices and meteorological data
title_full_unstemmed Predicting dengue transmission rates by comparing different machine learning models with vector indices and meteorological data
title_sort Predicting dengue transmission rates by comparing different machine learning models with vector indices and meteorological data
publishDate 2023
container_title Scientific Reports
container_volume 13
container_issue 1
doi_str_mv 10.1038/s41598-023-46342-2
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85175876072&doi=10.1038%2fs41598-023-46342-2&partnerID=40&md5=3beb9c18199661d6b555b8af98ee5fa7
description Machine learning algorithms (ML) are receiving a lot of attention in the development of predictive models for monitoring dengue transmission rates. Previous work has focused only on specific weather variables and algorithms, and there is still a need for a model that uses more variables and algorithms that have higher performance. In this study, we use vector indices and meteorological data as predictors to develop the ML models. We trained and validated seven ML algorithms, including an ensemble ML method, and compared their performance using the receiver operating characteristic (ROC) with the area under the curve (AUC), accuracy and F1 score. Our results show that an ensemble ML such as XG Boost, AdaBoost and Random Forest perform better than the logistics regression, Naïve Bayens, decision tree, and support vector machine (SVM), with XGBoost having the highest AUC, accuracy and F1 score. Analysis of the importance of the variables showed that the container index was the least important. By removing this variable, the ML models improved their performance by at least 6% in AUC and F1 score. Our result provides a framework for future studies on the use of predictive models in the development of an early warning system. © 2023, The Author(s).
publisher Nature Research
issn 20452322
language English
format Article
accesstype All Open Access; Gold Open Access; Green Open Access
record_format scopus
collection Scopus
_version_ 1820775444054540288