A schema readability metric for automated data quality measurement

Authors Lisa Ehrlinger
Gudrun Huszar
Wolfram Wöß
Editors Fritz Laux
Lisa Ehrlinger
Title A schema readability metric for automated data quality measurement
Booktitle Proceedings of the 11th International Conference on Advances in Databases, Knowledge, and Data Applications (DBKDA 2019)
Type in proceedings
Publisher IARIA
ISBN 978-1-61208-715-3
Month June
Year 2019
Pages 4-10
SCCH ID# 19004

Data quality measurement is a critical success factor to estimate the explanatory power of data-driven decisions. Several data quality dimensions, such as completeness, accuracy, and timeliness, have been investigated so far and metrics for their measurement have been proposed. While most research into those dimensions refers to the data values, schema quality dimensions in general, and readability in particular, have not gained sufficient attention so far. A poorly readable schema has a negative impact on the data quality, e.g., two attributes with different purpose, but synonymous labels may cause incorrectly inserted attribute values. Thus, we specifically observe the data quality dimension readability on schema-level and introduce a metric for its measurement. The measurement is based on a dictionary-approach using a wordnet, which takes into account the semantics of the words used in the schema (e.g., attribute labels). We implemented and evaluated the schema readability metric within the data quality tool QuaIIe.