Summary: | The ecosystem depends on biodiversity as it offers ecological services, helps lessen the effects of natural disasters, a source of economically essential goods, and has aesthetic and cultural benefits. However, the importance of biodiversity in delivering the necessities for human existence and well-being is not commonly recognized. Even though regional and fragmented analysis has been done, a thorough analysis technique has been challenging to develop. One of the approaches to improve sentiment analysis is by using supervised machine learning classifiers which is able to learn and adapt to the nuances of language. Hence, this study aims to implement a comparative analysis of different machine learning methods namely Support Vector Machine, Logistic Regression and Naïve Bayes in order to determine the most efficient classifier for sentiment analysis on tweets discussing biodiversity and related topics. The result shows that Logistic Regression with Bag-of-Word feature extraction is the best-performing machine learning algorithm for the given biodiversity datasets with an accuracy of 75.35%. This study also highlights the importance of feature extraction in optimizing machine learning models. Bag-of-Word feature extraction slightly outperforms TF-IDF by increasing the accuracy of the Logistic Regression classifier. Besides, the performance of machine learning model also increases with the increase of sample size and the combination of Logistic Regression with Bag-of-Word technique results to the best performance where it achieved 72.2% of accuracy for 2500 tweets sample. Future research could explore using ensemble learning to combine multiple machine learning algorithms for domain-specific sentiment analysis on social media networks. © 2023 IEEE.
|