A DaQL to monitor data quality in machine learning applications

L. Ehrlinger, V. Haunschmid, D. Palazzini, C. Lettner. A DaQL to monitor data quality in machine learning applications. volume 11709, pages 227-237, DOI https://doi.org/10.1007/978-3-030-27615-7_17, 8, 2019.

Autoren
  • Lisa Ehrlinger
  • Verena Haunschmid
  • Davide Palazzini
  • Christian Lettner
Editoren
  • S. Hartmann
  • J. Küng
  • S. Chakravarthy
  • G. Anderst-Kotsis
  • o.Univ.Prof. Dipl.Ing. Dr. A Min Tjoa
  • I. Khalil
BuchDatabase and Expert Systems Applications - Proc. DEXA 2019, Part II
TypIn Konferenzband
VerlagSpringer
SerieLecture Notes of Computer Science
Band11709
DOIhttps://doi.org/10.1007/978-3-030-27615-7_17
ISBN978-3-030-27614-0
Monat8
Jahr2019
Seiten227-237
Abstract

Machine learning models can only be as good as the data used to train them. Despite this obvious correlation, there is little research about data quality measurement to ensure the reliability and trustworthiness of machine learning models. Especially in industrial settings, where sensors produce large amounts of highly volatile data, a one-time measurement of the data quality is not sufficient since errors in new data should be detected as early as possible. Thus, in this paper, we present DaQL (Data Quality Library), a generally-applicable tool to continuously monitor the quality of data to increase the prediction accuracy of machine learning models. We demonstrate and evaluate DaQL within an industrial real-world machine learning application at Siemens.