A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment

This paper conducts a rigorous comparative analysis of query processing in a cluster environment, employing HiveQL and SparkSQL. Despite their shared SQL-like querying capabilities, their unique architectures and optimizations yield divergent performance outcomes. HiveQL relies on MapReduce, introdu...

Full description

Bibliographic Details
Published in:8th International Conference on Recent Advances and Innovations in Engineering: Empowering Computing, Analytics, and Engineering Through Digital Innovation, ICRAIE 2023
Main Author: Yahya M.H.; Ismail A.
Format: Conference paper
Language:English
Published: Institute of Electrical and Electronics Engineers Inc. 2023
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85189931031&doi=10.1109%2fICRAIE59459.2023.10468276&partnerID=40&md5=ce6fe65169868af7f4858365acfffd56
id 2-s2.0-85189931031
spelling 2-s2.0-85189931031
Yahya M.H.; Ismail A.
A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment
2023
8th International Conference on Recent Advances and Innovations in Engineering: Empowering Computing, Analytics, and Engineering Through Digital Innovation, ICRAIE 2023


10.1109/ICRAIE59459.2023.10468276
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85189931031&doi=10.1109%2fICRAIE59459.2023.10468276&partnerID=40&md5=ce6fe65169868af7f4858365acfffd56
This paper conducts a rigorous comparative analysis of query processing in a cluster environment, employing HiveQL and SparkSQL. Despite their shared SQL-like querying capabilities, their unique architectures and optimizations yield divergent performance outcomes. HiveQL relies on MapReduce, introducing potential bottlenecks for complex queries, whereas SparkSQL leverages in-memory processing, contributing to expedited execution. Consequently, an imperative performance and scalability evaluation is undertaken, with a focus on table schemas and query complexity, providing critical insights for technology selection the paper introduces a comprehensive performance and scalability analysis framework, furnishing nuanced insights into performance nuances based on table schemas and query intricacies, thereby empowering data professionals in optimal tool selection and design refinement within the dynamic realm of big data environments. © 2023 IEEE.
Institute of Electrical and Electronics Engineers Inc.

English
Conference paper

author Yahya M.H.; Ismail A.
spellingShingle Yahya M.H.; Ismail A.
A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment
author_facet Yahya M.H.; Ismail A.
author_sort Yahya M.H.; Ismail A.
title A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment
title_short A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment
title_full A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment
title_fullStr A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment
title_full_unstemmed A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment
title_sort A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment
publishDate 2023
container_title 8th International Conference on Recent Advances and Innovations in Engineering: Empowering Computing, Analytics, and Engineering Through Digital Innovation, ICRAIE 2023
container_volume
container_issue
doi_str_mv 10.1109/ICRAIE59459.2023.10468276
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85189931031&doi=10.1109%2fICRAIE59459.2023.10468276&partnerID=40&md5=ce6fe65169868af7f4858365acfffd56
description This paper conducts a rigorous comparative analysis of query processing in a cluster environment, employing HiveQL and SparkSQL. Despite their shared SQL-like querying capabilities, their unique architectures and optimizations yield divergent performance outcomes. HiveQL relies on MapReduce, introducing potential bottlenecks for complex queries, whereas SparkSQL leverages in-memory processing, contributing to expedited execution. Consequently, an imperative performance and scalability evaluation is undertaken, with a focus on table schemas and query complexity, providing critical insights for technology selection the paper introduces a comprehensive performance and scalability analysis framework, furnishing nuanced insights into performance nuances based on table schemas and query intricacies, thereby empowering data professionals in optimal tool selection and design refinement within the dynamic realm of big data environments. © 2023 IEEE.
publisher Institute of Electrical and Electronics Engineers Inc.
issn
language English
format Conference paper
accesstype
record_format scopus
collection Scopus
_version_ 1809677778966020096