A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment
This paper conducts a rigorous comparative analysis of query processing in a cluster environment, employing HiveQL and SparkSQL. Despite their shared SQL-like querying capabilities, their unique architectures and optimizations yield divergent performance outcomes. HiveQL relies on MapReduce, introdu...
Published in: | 8th International Conference on Recent Advances and Innovations in Engineering: Empowering Computing, Analytics, and Engineering Through Digital Innovation, ICRAIE 2023 |
---|---|
Main Author: | |
Format: | Conference paper |
Language: | English |
Published: |
Institute of Electrical and Electronics Engineers Inc.
2023
|
Online Access: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-85189931031&doi=10.1109%2fICRAIE59459.2023.10468276&partnerID=40&md5=ce6fe65169868af7f4858365acfffd56 |
id |
2-s2.0-85189931031 |
---|---|
spelling |
2-s2.0-85189931031 Yahya M.H.; Ismail A. A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment 2023 8th International Conference on Recent Advances and Innovations in Engineering: Empowering Computing, Analytics, and Engineering Through Digital Innovation, ICRAIE 2023 10.1109/ICRAIE59459.2023.10468276 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85189931031&doi=10.1109%2fICRAIE59459.2023.10468276&partnerID=40&md5=ce6fe65169868af7f4858365acfffd56 This paper conducts a rigorous comparative analysis of query processing in a cluster environment, employing HiveQL and SparkSQL. Despite their shared SQL-like querying capabilities, their unique architectures and optimizations yield divergent performance outcomes. HiveQL relies on MapReduce, introducing potential bottlenecks for complex queries, whereas SparkSQL leverages in-memory processing, contributing to expedited execution. Consequently, an imperative performance and scalability evaluation is undertaken, with a focus on table schemas and query complexity, providing critical insights for technology selection the paper introduces a comprehensive performance and scalability analysis framework, furnishing nuanced insights into performance nuances based on table schemas and query intricacies, thereby empowering data professionals in optimal tool selection and design refinement within the dynamic realm of big data environments. © 2023 IEEE. Institute of Electrical and Electronics Engineers Inc. English Conference paper |
author |
Yahya M.H.; Ismail A. |
spellingShingle |
Yahya M.H.; Ismail A. A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment |
author_facet |
Yahya M.H.; Ismail A. |
author_sort |
Yahya M.H.; Ismail A. |
title |
A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment |
title_short |
A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment |
title_full |
A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment |
title_fullStr |
A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment |
title_full_unstemmed |
A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment |
title_sort |
A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment |
publishDate |
2023 |
container_title |
8th International Conference on Recent Advances and Innovations in Engineering: Empowering Computing, Analytics, and Engineering Through Digital Innovation, ICRAIE 2023 |
container_volume |
|
container_issue |
|
doi_str_mv |
10.1109/ICRAIE59459.2023.10468276 |
url |
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85189931031&doi=10.1109%2fICRAIE59459.2023.10468276&partnerID=40&md5=ce6fe65169868af7f4858365acfffd56 |
description |
This paper conducts a rigorous comparative analysis of query processing in a cluster environment, employing HiveQL and SparkSQL. Despite their shared SQL-like querying capabilities, their unique architectures and optimizations yield divergent performance outcomes. HiveQL relies on MapReduce, introducing potential bottlenecks for complex queries, whereas SparkSQL leverages in-memory processing, contributing to expedited execution. Consequently, an imperative performance and scalability evaluation is undertaken, with a focus on table schemas and query complexity, providing critical insights for technology selection the paper introduces a comprehensive performance and scalability analysis framework, furnishing nuanced insights into performance nuances based on table schemas and query intricacies, thereby empowering data professionals in optimal tool selection and design refinement within the dynamic realm of big data environments. © 2023 IEEE. |
publisher |
Institute of Electrical and Electronics Engineers Inc. |
issn |
|
language |
English |
format |
Conference paper |
accesstype |
|
record_format |
scopus |
collection |
Scopus |
_version_ |
1809677778966020096 |