Evaluating the effectiveness of data quality framework in software engineering

The quality of data is important in research working with data sets because poor data quality may lead to invalid results. Data sets contain measurements that are associated with metrics and entities; however, in some data sets, it is not always clear which entities have been measured and exactly wh...

Full description

Bibliographic Details
Published in:International Journal of Electrical and Computer Engineering
Main Author: Rosli M.M.; Yusop N.S.M.
Format: Article
Language:English
Published: Institute of Advanced Engineering and Science 2022
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85139005826&doi=10.11591%2fijece.v12i6.pp6410-6422&partnerID=40&md5=2d487912d6fb56b49dc3504fd5bf9cdd
id 2-s2.0-85139005826
spelling 2-s2.0-85139005826
Rosli M.M.; Yusop N.S.M.
Evaluating the effectiveness of data quality framework in software engineering
2022
International Journal of Electrical and Computer Engineering
12
6
10.11591/ijece.v12i6.pp6410-6422
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85139005826&doi=10.11591%2fijece.v12i6.pp6410-6422&partnerID=40&md5=2d487912d6fb56b49dc3504fd5bf9cdd
The quality of data is important in research working with data sets because poor data quality may lead to invalid results. Data sets contain measurements that are associated with metrics and entities; however, in some data sets, it is not always clear which entities have been measured and exactly which metrics have been used. This means that measurements could be misinterpreted. In this study, we develop a framework for data quality assessment that determines whether a data set has sufficient information to support the correct interpretation of data for analysis in empirical research. The framework incorporates a dataset metamodel and a quality assessment process to evaluate the data set quality. To evaluate the effectiveness of our framework, we conducted a user study. We used observations, a questionnaire and think aloud approach to provide insights into the framework through participant thought processes while applying the framework. The results of our study provide evidence that most participants successfully applied the definitions of dataset category elements and the formal definitions of data quality issues to the datasets. Further work is needed to reproduce our results with more participants, and to determine whether the data quality framework is generalizable to other types of data sets. © 2022 Institute of Advanced Engineering and Science. All rights reserved.
Institute of Advanced Engineering and Science
20888708
English
Article
All Open Access; Gold Open Access
author Rosli M.M.; Yusop N.S.M.
spellingShingle Rosli M.M.; Yusop N.S.M.
Evaluating the effectiveness of data quality framework in software engineering
author_facet Rosli M.M.; Yusop N.S.M.
author_sort Rosli M.M.; Yusop N.S.M.
title Evaluating the effectiveness of data quality framework in software engineering
title_short Evaluating the effectiveness of data quality framework in software engineering
title_full Evaluating the effectiveness of data quality framework in software engineering
title_fullStr Evaluating the effectiveness of data quality framework in software engineering
title_full_unstemmed Evaluating the effectiveness of data quality framework in software engineering
title_sort Evaluating the effectiveness of data quality framework in software engineering
publishDate 2022
container_title International Journal of Electrical and Computer Engineering
container_volume 12
container_issue 6
doi_str_mv 10.11591/ijece.v12i6.pp6410-6422
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85139005826&doi=10.11591%2fijece.v12i6.pp6410-6422&partnerID=40&md5=2d487912d6fb56b49dc3504fd5bf9cdd
description The quality of data is important in research working with data sets because poor data quality may lead to invalid results. Data sets contain measurements that are associated with metrics and entities; however, in some data sets, it is not always clear which entities have been measured and exactly which metrics have been used. This means that measurements could be misinterpreted. In this study, we develop a framework for data quality assessment that determines whether a data set has sufficient information to support the correct interpretation of data for analysis in empirical research. The framework incorporates a dataset metamodel and a quality assessment process to evaluate the data set quality. To evaluate the effectiveness of our framework, we conducted a user study. We used observations, a questionnaire and think aloud approach to provide insights into the framework through participant thought processes while applying the framework. The results of our study provide evidence that most participants successfully applied the definitions of dataset category elements and the formal definitions of data quality issues to the datasets. Further work is needed to reproduce our results with more participants, and to determine whether the data quality framework is generalizable to other types of data sets. © 2022 Institute of Advanced Engineering and Science. All rights reserved.
publisher Institute of Advanced Engineering and Science
issn 20888708
language English
format Article
accesstype All Open Access; Gold Open Access
record_format scopus
collection Scopus
_version_ 1809678157340475392