A DaQL to monitor data quality in machine learning applications

Autoren Lisa Ehrlinger
Verena Haunschmid
Davide Palazzini
Christian Lettner
Editoren S. Hartmann
J. Küng
S. Chakravarthy
G. Anderst-Kotsis
A Min Tjoa
I. Khalil
Titel A DaQL to monitor data quality in machine learning applications
Buchtitel Database and Expert Systems Applications - Proc. DEXA 2019, Part II
Typ in Konferenzband
Verlag Springer
Serie Lecture Notes of Computer Science
Band 11709
ISBN 978-3-030-27614-0
DOI 10.1007/978-3-030-27615-7_17
Monat August
Jahr 2019
Seiten 227-237
SCCH ID# 19018

Machine learning models can only be as good as the data used to train them. Despite this obvious correlation, there is little research about data quality measurement to ensure the reliability and trustworthiness of machine learning models. Especially in industrial settings, where sensors produce large amounts of highly volatile data, a one-time measurement of the data quality is not sufficient since errors in new data should be detected as early as possible. Thus, in this paper, we present DaQL (Data Quality Library), a generally-applicable tool to continuously monitor the quality of data to increase the prediction accuracy of machine learning models. We demonstrate and evaluate DaQL within an industrial real-world machine learning application at Siemens.