A DaQL to monitor data quality in machine learning applications

Authors Lisa Ehrlinger
Verena Haunschmid
Davide Palazzini
Christian Lettner
Editors S. Hartmann
J. Küng
S. Chakravarthy
G. Anderst-Kotsis
A Min Tjoa
I. Khalil
Title A DaQL to monitor data quality in machine learning applications
Booktitle Database and Expert Systems Applications - Proc. DEXA 2019, Part II
Type in proceedings
Publisher Springer
Series Lecture Notes of Computer Science
Volume 11709
ISBN 978-3-030-27614-0
DOI 10.1007/978-3-030-27615-7_17
Month August
Year 2019
Pages 227-237
SCCH ID# 19018

Machine learning models can only be as good as the data used to train them. Despite this obvious correlation, there is little research about data quality measurement to ensure the reliability and trustworthiness of machine learning models. Especially in industrial settings, where sensors produce large amounts of highly volatile data, a one-time measurement of the data quality is not sufficient since errors in new data should be detected as early as possible. Thus, in this paper, we present DaQL (Data Quality Library), a generally-applicable tool to continuously monitor the quality of data to increase the prediction accuracy of machine learning models. We demonstrate and evaluate DaQL within an industrial real-world machine learning application at Siemens.