Evaluating the effectiveness of data quality framework in software engineering

The quality of data is important in research working with data sets because poor data quality may lead to invalid results. Data sets contain measurements that are associated with metrics and entities; however, in some data sets, it is not always clear which entities have been measured and exactly wh...

Full description

Bibliographic Details
Published in:International Journal of Electrical and Computer Engineering
Main Author: Rosli M.M.; Yusop N.S.M.
Format: Article
Language:English
Published: Institute of Advanced Engineering and Science 2022
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85139005826&doi=10.11591%2fijece.v12i6.pp6410-6422&partnerID=40&md5=2d487912d6fb56b49dc3504fd5bf9cdd
Description
Summary:The quality of data is important in research working with data sets because poor data quality may lead to invalid results. Data sets contain measurements that are associated with metrics and entities; however, in some data sets, it is not always clear which entities have been measured and exactly which metrics have been used. This means that measurements could be misinterpreted. In this study, we develop a framework for data quality assessment that determines whether a data set has sufficient information to support the correct interpretation of data for analysis in empirical research. The framework incorporates a dataset metamodel and a quality assessment process to evaluate the data set quality. To evaluate the effectiveness of our framework, we conducted a user study. We used observations, a questionnaire and think aloud approach to provide insights into the framework through participant thought processes while applying the framework. The results of our study provide evidence that most participants successfully applied the definitions of dataset category elements and the formal definitions of data quality issues to the datasets. Further work is needed to reproduce our results with more participants, and to determine whether the data quality framework is generalizable to other types of data sets. © 2022 Institute of Advanced Engineering and Science. All rights reserved.
ISSN:20888708
DOI:10.11591/ijece.v12i6.pp6410-6422