An approach on ETL attached data quality management

C. Lettner, R. Stumptner, K. Bokesch. An approach on ETL attached data quality management. volume 8646, pages 1-8, 9, 2014.

  • Christian Lettner
  • Reinhard Stumptner
  • Karl-Heinz Bokesch
  • L. Bellatreche
  • M. K. Mohania
BuchData Warehousing and Knowledge Discovery - Proc. DaWaK 2014
TypIn Konferenzband
SerieLecture Notes in Computer Science

This contribution introduces an approach on ETL attached Data Quality Management by means of an autonomous Data Quality Monitoring System. The Data Quality Monitor can be attached (via light-weight connectors) to already implemented ETL processes and allows to quantify data quality and to suggest measures if the quality of a particular data package falls below a certain limit for instance. Furthermore, the long-term vision of this approach is to correct corrupted data (semi-)automatically according to user defined Data Quality Rules. The Data Quality Monitor can be attached to an ETL process by defining "snapshot points", where data samples which should be validated are collected and by introducing "approval points", where an ETL process can be interrupted in case of corrupted input data. As the Data Quality Monitor is an autonomous module which is attached to instead of embedded into ETL processes, this approach supports the division of work between ETL developers and special data quality engineers.