Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies

Twitter has emerged as a rich source of real-time data, providing valuable insights across various domains such as healthcare and business. Sentiment analysis is crucial in understanding public reactions and sentiments expressed on Twitter, empowering organizations to make informed decisions. Howeve...

Full description

Bibliographic Details
Published in:EXPERT SYSTEMS WITH APPLICATIONS
Main Authors: Ismail, Azlan; Sazali, Faris Haziq; Jawaddi, Siti Nuraishah Agos; Mutalib, Sofianita
Format: Article
Language:English
Published: PERGAMON-ELSEVIER SCIENCE LTD 2025
Subjects:
Online Access:https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001341067900001
Description
Summary:Twitter has emerged as a rich source of real-time data, providing valuable insights across various domains such as healthcare and business. Sentiment analysis is crucial in understanding public reactions and sentiments expressed on Twitter, empowering organizations to make informed decisions. However, efficiently analyzing sentiment from social media data, presents a challenge for real-time streaming. Conventional ExtractTransform-Load (ETL) methods, inadequate for this challenge, limit their applicability in processing vast volumes of Twitter data. Big data technologies like Kafka, Spark, Hadoop HDFS, Hive, and HBase have become indispensable in addressing this challenge. To address this, we propose a stream ETL framework for Twitter-based sentiment analysis, leveraging Kafka, Spark, Cassandra, HBase, Hive, and HDFS. Our framework enables data stream processing, bias detection and correction, sentiment-based analysis, and visualization of tweets' geospatial distribution. We present a set of use case studies to illustrate the applicability of the proposed framework comprises of sentiment classification, and bias detection and correction capability. We also present a comparative study to demonstrate the performance of data streaming processing, analysis, and visualization that has been implemented using multiple big data technologies under different parameter settings. Experimental results demonstrate the framework scalability and trade-off factors of the data stream processing pipeline in the execution of big data processing tasks.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2024.125523