Summary: | This paper conducts a rigorous comparative analysis of query processing in a cluster environment, employing HiveQL and SparkSQL. Despite their shared SQL-like querying capabilities, their unique architectures and optimizations yield divergent performance outcomes. HiveQL relies on MapReduce, introducing potential bottlenecks for complex queries, whereas SparkSQL leverages in-memory processing, contributing to expedited execution. Consequently, an imperative performance and scalability evaluation is undertaken, with a focus on table schemas and query complexity, providing critical insights for technology selection the paper introduces a comprehensive performance and scalability analysis framework, furnishing nuanced insights into performance nuances based on table schemas and query intricacies, thereby empowering data professionals in optimal tool selection and design refinement within the dynamic realm of big data environments. © 2023 IEEE.
|