A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment

This paper conducts a rigorous comparative analysis of query processing in a cluster environment, employing HiveQL and SparkSQL. Despite their shared SQL-like querying capabilities, their unique architectures and optimizations yield divergent performance outcomes. HiveQL relies on MapReduce, introdu...

Full description

Bibliographic Details
Published in:	8th International Conference on Recent Advances and Innovations in Engineering: Empowering Computing, Analytics, and Engineering Through Digital Innovation, ICRAIE 2023
Main Author:	Yahya M.H.; Ismail A.
Format:	Conference paper
Language:	English
Published:	Institute of Electrical and Electronics Engineers Inc. 2023
Online Access:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85189931031&doi=10.1109%2fICRAIE59459.2023.10468276&partnerID=40&md5=ce6fe65169868af7f4858365acfffd56

id	2-s2.0-85189931031
spelling	2-s2.0-85189931031 Yahya M.H.; Ismail A. A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment 2023 8th International Conference on Recent Advances and Innovations in Engineering: Empowering Computing, Analytics, and Engineering Through Digital Innovation, ICRAIE 2023 10.1109/ICRAIE59459.2023.10468276 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85189931031&doi=10.1109%2fICRAIE59459.2023.10468276&partnerID=40&md5=ce6fe65169868af7f4858365acfffd56 This paper conducts a rigorous comparative analysis of query processing in a cluster environment, employing HiveQL and SparkSQL. Despite their shared SQL-like querying capabilities, their unique architectures and optimizations yield divergent performance outcomes. HiveQL relies on MapReduce, introducing potential bottlenecks for complex queries, whereas SparkSQL leverages in-memory processing, contributing to expedited execution. Consequently, an imperative performance and scalability evaluation is undertaken, with a focus on table schemas and query complexity, providing critical insights for technology selection the paper introduces a comprehensive performance and scalability analysis framework, furnishing nuanced insights into performance nuances based on table schemas and query intricacies, thereby empowering data professionals in optimal tool selection and design refinement within the dynamic realm of big data environments. © 2023 IEEE. Institute of Electrical and Electronics Engineers Inc. English Conference paper
author	Yahya M.H.; Ismail A.
spellingShingle	Yahya M.H.; Ismail A. A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment
author_facet	Yahya M.H.; Ismail A.
author_sort	Yahya M.H.; Ismail A.
title	A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment
title_short	A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment
title_full	A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment
title_fullStr	A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment
title_full_unstemmed	A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment
title_sort	A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment
publishDate	2023
container_title	8th International Conference on Recent Advances and Innovations in Engineering: Empowering Computing, Analytics, and Engineering Through Digital Innovation, ICRAIE 2023
container_volume
container_issue
doi_str_mv	10.1109/ICRAIE59459.2023.10468276
url	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85189931031&doi=10.1109%2fICRAIE59459.2023.10468276&partnerID=40&md5=ce6fe65169868af7f4858365acfffd56
description	This paper conducts a rigorous comparative analysis of query processing in a cluster environment, employing HiveQL and SparkSQL. Despite their shared SQL-like querying capabilities, their unique architectures and optimizations yield divergent performance outcomes. HiveQL relies on MapReduce, introducing potential bottlenecks for complex queries, whereas SparkSQL leverages in-memory processing, contributing to expedited execution. Consequently, an imperative performance and scalability evaluation is undertaken, with a focus on table schemas and query complexity, providing critical insights for technology selection the paper introduces a comprehensive performance and scalability analysis framework, furnishing nuanced insights into performance nuances based on table schemas and query intricacies, thereby empowering data professionals in optimal tool selection and design refinement within the dynamic realm of big data environments. © 2023 IEEE.
publisher	Institute of Electrical and Electronics Engineers Inc.
issn
language	English
format	Conference paper
accesstype
record_format	scopus
collection	Scopus
_version_	1809677778966020096

A Comparative Study of HiveQL and SparkSQL Query Performance in a Cluster Environment

Similar Items