Talk at the MIT

Automating Data Quality Measurement with Tools

DI Lisa Ehrlinger speaks on August 2nd 2019 about "Automating Data Quality Measurement with Tools" in the course of the renowned 13th Annual MIT Chief Data Officer and Information Quality (MIT CDOIQ) Symposium at the Massachusetts Institute of Technology (MIT). The talk provides a comprehensive overview on state-of-the-art DQ tools and reveals potential for functional enhancements of the tools.

Different tools

Over the recent years, a wide variety of commercial, open source, and academic data quality (DQ) applications with different foci have been developed. Companies are often unsure which DQ tool is best suited for their needs, because the range of functions offered by those tools varies widely. In a systematic search, Ehrlinger and her colleagues identified 667 software tools dedicated to "data quality", from which 12 tools have been selected for deeper investigation by means of pre-defined exclusion criteria. Amongst others, the tools Informatica Data Quality, Experian Pandora, Talend Open Studio, Oracle EDQ, SAS Data Quality, and Quadient Data Clenaer have been investigated. The tools have been evaluated with a fine-grained requirements catalog, which is divided into the three categories (1) data profiling, (2) data quality measurement in terms of metrics and (3) continuous data quality monitoring. In her talk, Ehrlinger will present the strength and weaknesses of the single tools, based on the extent to which they fulfill the different requirements. Additionally, she will give an overview on the wide variety of DQ tools available on the market, which have been discovered in the systematic search, but have not been mentioned in any existing survey so far (e.g., Gartner Magic Quadrant of Data Quality Tools).

Research cooperation with Johannes Kepler University Linz

The research has been developed in the frame of Lisa Ehrlinger's PhD thesis about automated continuous data quality measurement. The PhD thesis is supervised by a.Univ.-Prof. Dr. Wolfram Wöß from the Institute of Application-oriented Knowledge Processing (FAW) at the Johannes Kepler University Linz.

At the Software Competence Center Hagenberg, Ehrlinger is project lead in the COMET project Sebista (Secure Big Stream Data Processing), which aims at developing novel methods for storing and processing large amounts of data and to ensure high data quality as the foundation for machine learning and artificial intelligence.