A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment

This paper conducts a rigorous comparative analysis of query processing in a cluster environment, employing HiveQL and SparkSQL. Despite their shared SQL-like querying capabilities, their unique architectures and optimizations yield divergent performance outcomes. HiveQL relies on MapReduce, introdu...

Full description

Bibliographic Details
Published in:8th International Conference on Recent Advances and Innovations in Engineering: Empowering Computing, Analytics, and Engineering Through Digital Innovation, ICRAIE 2023
Main Author: Yahya M.H.; Ismail A.
Format: Conference paper
Language:English
Published: Institute of Electrical and Electronics Engineers Inc. 2023
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85189931031&doi=10.1109%2fICRAIE59459.2023.10468276&partnerID=40&md5=ce6fe65169868af7f4858365acfffd56
Description
Summary:This paper conducts a rigorous comparative analysis of query processing in a cluster environment, employing HiveQL and SparkSQL. Despite their shared SQL-like querying capabilities, their unique architectures and optimizations yield divergent performance outcomes. HiveQL relies on MapReduce, introducing potential bottlenecks for complex queries, whereas SparkSQL leverages in-memory processing, contributing to expedited execution. Consequently, an imperative performance and scalability evaluation is undertaken, with a focus on table schemas and query complexity, providing critical insights for technology selection the paper introduces a comprehensive performance and scalability analysis framework, furnishing nuanced insights into performance nuances based on table schemas and query intricacies, thereby empowering data professionals in optimal tool selection and design refinement within the dynamic realm of big data environments. © 2023 IEEE.
ISSN:
DOI:10.1109/ICRAIE59459.2023.10468276