An approach on ETL attached data quality management

Authors Christian Lettner
Reinhard Stumptner
Karl-Heinz Bokesch
Editors L. Bellatreche
M. K. Mohania
Title An approach on ETL attached data quality management
Booktitle Data Warehousing and Knowledge Discovery - Proc. DaWaK 2014
Type in proceedings
Publisher Springer
Series Lecture Notes in Computer Science
Volume 8646
ISBN 978-3-319-10159-0
Month September
Year 2014
Pages 1-8
SCCH ID# 1406

This contribution introduces an approach on ETL attached Data Quality Management by means of an autonomous Data Quality Monitoring System. The Data Quality Monitor can be attached (via light-weight connectors) to already implemented ETL processes and allows to quantify data quality and to suggest measures if the quality of a particular data package falls below a certain limit for instance. Furthermore, the long-term vision of this approach is to correct corrupted data (semi-)automatically according to user defined Data Quality Rules. The Data Quality Monitor can be attached to an ETL process by defining "snapshot points", where data samples which should be validated are collected and by introducing "approval points", where an ETL process can be interrupted in case of corrupted input data. As the Data Quality Monitor is an autonomous module which is attached to instead of embedded into ETL processes, this approach supports the division of work between ETL developers and special data quality engineers.