Automated schema quality measurement in large-scale information systems

Autoren Lisa Ehrlinger
Wolfram Wöß
Editoren H. Hacid
Q.Z. Sheng
T. Yoshida
A. Sarkheyli
R. Zhou
Titel Automated schema quality measurement in large-scale information systems
Buchtitel Data Quality and Trust in Big Data – QUAT 2018 in conjunction with WISE 2018, Revised Selected Papers
Typ in Buch
Verlag Springer
Serie Lecture Notes of Computer Science
Band 11235
ISBN 978-3-030-19142-9
DOI 10.1007/978-3-030-19143-6_2
Monat April
Jahr 2019
Seiten 16-31
SCCH ID# 17049
Abstract

Assessing the quality of information system schemas is crucial, because an unoptimized or erroneous schema design has a strong impact on the quality of the stored data, e.g., it may lead to inconsistencies and anomalies at the data-level. Even if the initial schema had an ideal design, changes during the life cycle can negatively affect the schema quality and have to be tackled. Especially in Big Data environments there are two major challenges: large schemas, where manual verification of schema and data quality is very arduous, and the integration of heterogeneous schemas from different data models, whose quality cannot be compared directly. Thus, we present a domain-independent approach for automatically measuring the quality of large and heterogeneous (logical) schemas. In contrast to existing approaches, we provide a fully automatable workflow that also enables regular reassessment. Our implementation allows to measure the quality dimensions correctness, completeness, pertinence, minimality, readability, and normalization.