Automated schema quality measurement in large-scale information systems

Authors Lisa Ehrlinger
Wolfram Wöß
Editors H. Hacid
Q.Z. Sheng
T. Yoshida
A. Sarkheyli
R. Zhou
Title Automated schema quality measurement in large-scale information systems
Booktitle Data Quality and Trust in Big Data – QUAT 2018 in conjunction with WISE 2018, Revised Selected Papers
Type in book
Publisher Springer
Series Lecture Notes of Computer Science
Volume 11235
ISBN 978-3-030-19142-9
DOI 10.1007/978-3-030-19143-6_2
Month April
Year 2019
Pages 16-31
SCCH ID# 17049

Assessing the quality of information system schemas is crucial, because an unoptimized or erroneous schema design has a strong impact on the quality of the stored data, e.g., it may lead to inconsistencies and anomalies at the data-level. Even if the initial schema had an ideal design, changes during the life cycle can negatively affect the schema quality and have to be tackled. Especially in Big Data environments there are two major challenges: large schemas, where manual verification of schema and data quality is very arduous, and the integration of heterogeneous schemas from different data models, whose quality cannot be compared directly. Thus, we present a domain-independent approach for automatically measuring the quality of large and heterogeneous (logical) schemas. In contrast to existing approaches, we provide a fully automatable workflow that also enables regular reassessment. Our implementation allows to measure the quality dimensions correctness, completeness, pertinence, minimality, readability, and normalization.