A schema readability metric for automated data quality measurement

Autoren Lisa Ehrlinger
Gudrun Huszar
Wolfram Wöß
Editoren Fritz Laux
Lisa Ehrlinger
Titel A schema readability metric for automated data quality measurement
Buchtitel Proceedings of the 11th International Conference on Advances in Databases, Knowledge, and Data Applications (DBKDA 2019)
Typ in Konferenzband
Verlag IARIA
ISBN 978-1-61208-715-3
Monat June
Jahr 2019
Seiten 4-10
SCCH ID# 19004

Data quality measurement is a critical success factor to estimate the explanatory power of data-driven decisions. Several data quality dimensions, such as completeness, accuracy, and timeliness, have been investigated so far and metrics for their measurement have been proposed. While most research into those dimensions refers to the data values, schema quality dimensions in general, and readability in particular, have not gained sufficient attention so far. A poorly readable schema has a negative impact on the data quality, e.g., two attributes with different purpose, but synonymous labels may cause incorrectly inserted attribute values. Thus, we specifically observe the data quality dimension readability on schema-level and introduce a metric for its measurement. The measurement is based on a dictionary-approach using a wordnet, which takes into account the semantics of the words used in the schema (e.g., attribute labels). We implemented and evaluated the schema readability metric within the data quality tool QuaIIe.