Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies
Twitter has emerged as a rich source of real-time data, providing valuable insights across various domains such as healthcare and business. Sentiment analysis is crucial in understanding public reactions and sentiments expressed on Twitter, empowering organizations to make informed decisions. Howeve...
Published in: | EXPERT SYSTEMS WITH APPLICATIONS |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Published: |
PERGAMON-ELSEVIER SCIENCE LTD
2025
|
Subjects: | |
Online Access: | https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001341067900001 |
author |
Ismail Azlan; Sazali Faris Haziq; Jawaddi Siti Nuraishah Agos; Mutalib Sofianita |
---|---|
spellingShingle |
Ismail Azlan; Sazali Faris Haziq; Jawaddi Siti Nuraishah Agos; Mutalib Sofianita Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies Computer Science; Engineering; Operations Research & Management Science |
author_facet |
Ismail Azlan; Sazali Faris Haziq; Jawaddi Siti Nuraishah Agos; Mutalib Sofianita |
author_sort |
Ismail |
spelling |
Ismail, Azlan; Sazali, Faris Haziq; Jawaddi, Siti Nuraishah Agos; Mutalib, Sofianita Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies EXPERT SYSTEMS WITH APPLICATIONS English Article Twitter has emerged as a rich source of real-time data, providing valuable insights across various domains such as healthcare and business. Sentiment analysis is crucial in understanding public reactions and sentiments expressed on Twitter, empowering organizations to make informed decisions. However, efficiently analyzing sentiment from social media data, presents a challenge for real-time streaming. Conventional ExtractTransform-Load (ETL) methods, inadequate for this challenge, limit their applicability in processing vast volumes of Twitter data. Big data technologies like Kafka, Spark, Hadoop HDFS, Hive, and HBase have become indispensable in addressing this challenge. To address this, we propose a stream ETL framework for Twitter-based sentiment analysis, leveraging Kafka, Spark, Cassandra, HBase, Hive, and HDFS. Our framework enables data stream processing, bias detection and correction, sentiment-based analysis, and visualization of tweets' geospatial distribution. We present a set of use case studies to illustrate the applicability of the proposed framework comprises of sentiment classification, and bias detection and correction capability. We also present a comparative study to demonstrate the performance of data streaming processing, analysis, and visualization that has been implemented using multiple big data technologies under different parameter settings. Experimental results demonstrate the framework scalability and trade-off factors of the data stream processing pipeline in the execution of big data processing tasks. PERGAMON-ELSEVIER SCIENCE LTD 0957-4174 1873-6793 2025 261 10.1016/j.eswa.2024.125523 Computer Science; Engineering; Operations Research & Management Science WOS:001341067900001 https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001341067900001 |
title |
Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies |
title_short |
Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies |
title_full |
Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies |
title_fullStr |
Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies |
title_full_unstemmed |
Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies |
title_sort |
Stream ETL framework for twitter-based sentiment analysis: Leveraging big data technologies |
container_title |
EXPERT SYSTEMS WITH APPLICATIONS |
language |
English |
format |
Article |
description |
Twitter has emerged as a rich source of real-time data, providing valuable insights across various domains such as healthcare and business. Sentiment analysis is crucial in understanding public reactions and sentiments expressed on Twitter, empowering organizations to make informed decisions. However, efficiently analyzing sentiment from social media data, presents a challenge for real-time streaming. Conventional ExtractTransform-Load (ETL) methods, inadequate for this challenge, limit their applicability in processing vast volumes of Twitter data. Big data technologies like Kafka, Spark, Hadoop HDFS, Hive, and HBase have become indispensable in addressing this challenge. To address this, we propose a stream ETL framework for Twitter-based sentiment analysis, leveraging Kafka, Spark, Cassandra, HBase, Hive, and HDFS. Our framework enables data stream processing, bias detection and correction, sentiment-based analysis, and visualization of tweets' geospatial distribution. We present a set of use case studies to illustrate the applicability of the proposed framework comprises of sentiment classification, and bias detection and correction capability. We also present a comparative study to demonstrate the performance of data streaming processing, analysis, and visualization that has been implemented using multiple big data technologies under different parameter settings. Experimental results demonstrate the framework scalability and trade-off factors of the data stream processing pipeline in the execution of big data processing tasks. |
publisher |
PERGAMON-ELSEVIER SCIENCE LTD |
issn |
0957-4174 1873-6793 |
publishDate |
2025 |
container_volume |
261 |
container_issue |
|
doi_str_mv |
10.1016/j.eswa.2024.125523 |
topic |
Computer Science; Engineering; Operations Research & Management Science |
topic_facet |
Computer Science; Engineering; Operations Research & Management Science |
accesstype |
|
id |
WOS:001341067900001 |
url |
https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001341067900001 |
record_format |
wos |
collection |
Web of Science (WoS) |
_version_ |
1814778543733735424 |