Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies

Twitter has emerged as a rich source of real-time data, providing valuable insights across various domains such as healthcare and business. Sentiment analysis is crucial in understanding public reactions and sentiments expressed on Twitter, empowering organizations to make informed decisions. Howeve...

Full description

Bibliographic Details
Published in:EXPERT SYSTEMS WITH APPLICATIONS
Main Authors: Ismail, Azlan; Sazali, Faris Haziq; Jawaddi, Siti Nuraishah Agos; Mutalib, Sofianita
Format: Article
Language:English
Published: PERGAMON-ELSEVIER SCIENCE LTD 2025
Subjects:
Online Access:https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001341067900001
author Ismail
Azlan; Sazali
Faris Haziq; Jawaddi
Siti Nuraishah Agos; Mutalib
Sofianita
spellingShingle Ismail
Azlan; Sazali
Faris Haziq; Jawaddi
Siti Nuraishah Agos; Mutalib
Sofianita
Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies
Computer Science; Engineering; Operations Research & Management Science
author_facet Ismail
Azlan; Sazali
Faris Haziq; Jawaddi
Siti Nuraishah Agos; Mutalib
Sofianita
author_sort Ismail
spelling Ismail, Azlan; Sazali, Faris Haziq; Jawaddi, Siti Nuraishah Agos; Mutalib, Sofianita
Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies
EXPERT SYSTEMS WITH APPLICATIONS
English
Article
Twitter has emerged as a rich source of real-time data, providing valuable insights across various domains such as healthcare and business. Sentiment analysis is crucial in understanding public reactions and sentiments expressed on Twitter, empowering organizations to make informed decisions. However, efficiently analyzing sentiment from social media data, presents a challenge for real-time streaming. Conventional ExtractTransform-Load (ETL) methods, inadequate for this challenge, limit their applicability in processing vast volumes of Twitter data. Big data technologies like Kafka, Spark, Hadoop HDFS, Hive, and HBase have become indispensable in addressing this challenge. To address this, we propose a stream ETL framework for Twitter-based sentiment analysis, leveraging Kafka, Spark, Cassandra, HBase, Hive, and HDFS. Our framework enables data stream processing, bias detection and correction, sentiment-based analysis, and visualization of tweets' geospatial distribution. We present a set of use case studies to illustrate the applicability of the proposed framework comprises of sentiment classification, and bias detection and correction capability. We also present a comparative study to demonstrate the performance of data streaming processing, analysis, and visualization that has been implemented using multiple big data technologies under different parameter settings. Experimental results demonstrate the framework scalability and trade-off factors of the data stream processing pipeline in the execution of big data processing tasks.
PERGAMON-ELSEVIER SCIENCE LTD
0957-4174
1873-6793
2025
261

10.1016/j.eswa.2024.125523
Computer Science; Engineering; Operations Research & Management Science

WOS:001341067900001
https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001341067900001
title Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies
title_short Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies
title_full Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies
title_fullStr Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies
title_full_unstemmed Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies
title_sort Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies
container_title EXPERT SYSTEMS WITH APPLICATIONS
language English
format Article
description Twitter has emerged as a rich source of real-time data, providing valuable insights across various domains such as healthcare and business. Sentiment analysis is crucial in understanding public reactions and sentiments expressed on Twitter, empowering organizations to make informed decisions. However, efficiently analyzing sentiment from social media data, presents a challenge for real-time streaming. Conventional ExtractTransform-Load (ETL) methods, inadequate for this challenge, limit their applicability in processing vast volumes of Twitter data. Big data technologies like Kafka, Spark, Hadoop HDFS, Hive, and HBase have become indispensable in addressing this challenge. To address this, we propose a stream ETL framework for Twitter-based sentiment analysis, leveraging Kafka, Spark, Cassandra, HBase, Hive, and HDFS. Our framework enables data stream processing, bias detection and correction, sentiment-based analysis, and visualization of tweets' geospatial distribution. We present a set of use case studies to illustrate the applicability of the proposed framework comprises of sentiment classification, and bias detection and correction capability. We also present a comparative study to demonstrate the performance of data streaming processing, analysis, and visualization that has been implemented using multiple big data technologies under different parameter settings. Experimental results demonstrate the framework scalability and trade-off factors of the data stream processing pipeline in the execution of big data processing tasks.
publisher PERGAMON-ELSEVIER SCIENCE LTD
issn 0957-4174
1873-6793
publishDate 2025
container_volume 261
container_issue
doi_str_mv 10.1016/j.eswa.2024.125523
topic Computer Science; Engineering; Operations Research & Management Science
topic_facet Computer Science; Engineering; Operations Research & Management Science
accesstype
id WOS:001341067900001
url https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001341067900001
record_format wos
collection Web of Science (WoS)
_version_ 1814778543733735424