Summary: | Datasets basically contains data and metadata. Data are often misinterpreted due to insufficient metadata and gives rise to quality issues associated with the datasets such as failure to clearly identify the entity being measured and inability to clarify how the metrics were generated. We believe creating common agreement about the terminology and concepts in datasets is important to ensure the meaning of data able to be interpreted correctly. We developed dataset metamodel that describes the structure and concepts in a dataset, and the relationships between each concept to gain a shared understanding of the content of datasets. As a preliminary evaluation, we conducted a user study to evaluate the effectiveness of dataset metamodel. We used an online survey as our user study method. The survey aims to study how well participants understand the definitions of dataset category elements in the dataset metamodel and able to apply them to a range of data sets. We found that participants who had relevant background knowledge and experience in research, particularly in analysing data sets able to answer more questions correctly than participants who had less relevant background knowledge and experience in research. The results of our survey provide evidence that our dataset metamodel is effective to be used by researchers to model datasets for analysis in software engineering. Future work, we need to reproduce the results with more appropriately sized samples of researchers in the relevant areas. © 2018 The authors and IOS Press. All rights reserved.
|