A schema readability metric for automated data quality measurement
|L. Ehrlinger, G. Huszar, W. Wöß. A schema readability metric for automated data quality measurement. pages 4-10, 6, 2019.|
|Buch||Proceedings of the 11th International Conference on Advances in Databases, Knowledge, and Data Applications (DBKDA 2019)|
Data quality measurement is a critical success factor to estimate the explanatory power of data-driven decisions. Several data quality dimensions, such as completeness, accuracy, and timeliness, have been investigated so far and metrics for their measurement have been proposed. While most research into those dimensions refers to the data values, schema quality dimensions in general, and readability in particular, have not gained sufficient attention so far. A poorly readable schema has a negative impact on the data quality, e.g., two attributes with different purpose, but synonymous labels may cause incorrectly inserted attribute values. Thus, we specifically observe the data quality dimension readability on schema-level and introduce a metric for its measurement. The measurement is based on a dictionary-approach using a wordnet, which takes into account the semantics of the words used in the schema (e.g., attribute labels). We implemented and evaluated the schema readability metric within the data quality tool QuaIIe.