Summary: | The pervasiveness of misinformation surrounding the COVID-19 pandemic has garnered heightened attention due to its implications, as a noteworthy proportion of the populace is being exposed to spurious and unsubstantiated narratives concerning the crisis. This research utilizes a dataset sourced from Codalab, comprising 8,560 tweets, with 4,480 labelled as real and 4,080 as fake. The research explores the effectiveness of different machine learning models, including logistic regression (LR), random forest (RF), and deep learning models such as Convolutional Neural Networks (CNN) and Bidirectional Long Short-Term Memory (Bi-LSTM). In addition to model comparison, experiments were conducted to analyze the impact of different data splits (70:30, 80:20, and 90:10), batch sizes (16, 32, and 64), and the number of epochs (5, 10, and 15) on model performance. The experiments provided insights into the optimal configurations for the models. The results showcase the model's capabilities, with high accuracy achieved across the different models. Specifically, logistic regression achieved an accuracy of 92%, random forest 91%, Bi-LSTM 93%, and CNN 95%. These findings highlight the potential of deep learning models, particularly CNN, in accurately detecting fake news from COVID-19-related tweets. © 2023 IEEE.
|