Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies

Twitter has emerged as a rich source of real-time data, providing valuable insights across various domains such as healthcare and business. Sentiment analysis is crucial in understanding public reactions and sentiments expressed on Twitter, empowering organizations to make informed decisions. Howeve...

Full description

Bibliographic Details
Published in:Expert Systems with Applications
Main Author: Ismail A.; Sazali F.H.; Agos Jawaddi S.N.; Mutalib S.
Format: Article
Language:English
Published: Elsevier Ltd 2025
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85206601204&doi=10.1016%2fj.eswa.2024.125523&partnerID=40&md5=c7f27b8970860bd0aa0c348753aad6e8
id 2-s2.0-85206601204
spelling 2-s2.0-85206601204
Ismail A.; Sazali F.H.; Agos Jawaddi S.N.; Mutalib S.
Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies
2025
Expert Systems with Applications
261

10.1016/j.eswa.2024.125523
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85206601204&doi=10.1016%2fj.eswa.2024.125523&partnerID=40&md5=c7f27b8970860bd0aa0c348753aad6e8
Twitter has emerged as a rich source of real-time data, providing valuable insights across various domains such as healthcare and business. Sentiment analysis is crucial in understanding public reactions and sentiments expressed on Twitter, empowering organizations to make informed decisions. However, efficiently analyzing sentiment from social media data, presents a challenge for real-time streaming. Conventional Extract-Transform-Load (ETL) methods, inadequate for this challenge, limit their applicability in processing vast volumes of Twitter data. Big data technologies like Kafka, Spark, Hadoop HDFS, Hive, and HBase have become indispensable in addressing this challenge. To address this, we propose a stream ETL framework for Twitter-based sentiment analysis, leveraging Kafka, Spark, Cassandra, HBase, Hive, and HDFS. Our framework enables data stream processing, bias detection and correction, sentiment-based analysis, and visualization of tweets’ geospatial distribution. We present a set of use case studies to illustrate the applicability of the proposed framework comprises of sentiment classification, and bias detection and correction capability. We also present a comparative study to demonstrate the performance of data streaming processing, analysis, and visualization that has been implemented using multiple big data technologies under different parameter settings. Experimental results demonstrate the framework scalability and trade-off factors of the data stream processing pipeline in the execution of big data processing tasks. © 2024
Elsevier Ltd
09574174
English
Article

author Ismail A.; Sazali F.H.; Agos Jawaddi S.N.; Mutalib S.
spellingShingle Ismail A.; Sazali F.H.; Agos Jawaddi S.N.; Mutalib S.
Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies
author_facet Ismail A.; Sazali F.H.; Agos Jawaddi S.N.; Mutalib S.
author_sort Ismail A.; Sazali F.H.; Agos Jawaddi S.N.; Mutalib S.
title Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies
title_short Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies
title_full Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies
title_fullStr Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies
title_full_unstemmed Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies
title_sort Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies
publishDate 2025
container_title Expert Systems with Applications
container_volume 261
container_issue
doi_str_mv 10.1016/j.eswa.2024.125523
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85206601204&doi=10.1016%2fj.eswa.2024.125523&partnerID=40&md5=c7f27b8970860bd0aa0c348753aad6e8
description Twitter has emerged as a rich source of real-time data, providing valuable insights across various domains such as healthcare and business. Sentiment analysis is crucial in understanding public reactions and sentiments expressed on Twitter, empowering organizations to make informed decisions. However, efficiently analyzing sentiment from social media data, presents a challenge for real-time streaming. Conventional Extract-Transform-Load (ETL) methods, inadequate for this challenge, limit their applicability in processing vast volumes of Twitter data. Big data technologies like Kafka, Spark, Hadoop HDFS, Hive, and HBase have become indispensable in addressing this challenge. To address this, we propose a stream ETL framework for Twitter-based sentiment analysis, leveraging Kafka, Spark, Cassandra, HBase, Hive, and HDFS. Our framework enables data stream processing, bias detection and correction, sentiment-based analysis, and visualization of tweets’ geospatial distribution. We present a set of use case studies to illustrate the applicability of the proposed framework comprises of sentiment classification, and bias detection and correction capability. We also present a comparative study to demonstrate the performance of data streaming processing, analysis, and visualization that has been implemented using multiple big data technologies under different parameter settings. Experimental results demonstrate the framework scalability and trade-off factors of the data stream processing pipeline in the execution of big data processing tasks. © 2024
publisher Elsevier Ltd
issn 09574174
language English
format Article
accesstype
record_format scopus
collection Scopus
_version_ 1814778497081540608