A DaQL to monitor data quality in machine learning applications

L. Ehrlinger, V. Haunschmid, D. Palazzini, C. Lettner. A DaQL to monitor data quality in machine learning applications. volume 11709, pages 227-237, DOI https://doi.org/10.1007/978-3-030-27615-7_17, 8, 2019.

  • Lisa Ehrlinger
  • Verena Haunschmid
  • Davide Palazzini
  • Christian Lettner
  • S. Hartmann
  • J. Küng
  • S. Chakravarthy
  • G. Anderst-Kotsis
  • o.Univ.Prof. Dipl.Ing. Dr. A Min Tjoa
  • I. Khalil
BuchDatabase and Expert Systems Applications - Proc. DEXA 2019, Part II
TypIn Konferenzband
SerieLecture Notes of Computer Science

Machine learning models can only be as good as the data used to train them. Despite this obvious correlation, there is little research about data quality measurement to ensure the reliability and trustworthiness of machine learning models. Especially in industrial settings, where sensors produce large amounts of highly volatile data, a one-time measurement of the data quality is not sufficient since errors in new data should be detected as early as possible. Thus, in this paper, we present DaQL (Data Quality Library), a generally-applicable tool to continuously monitor the quality of data to increase the prediction accuracy of machine learning models. We demonstrate and evaluate DaQL within an industrial real-world machine learning application at Siemens.