ML-PipeDebugger: A debugging tool for data processing pipelines?

Autoren Felix Kossak
Michael Zwick
Editoren S. Hartmann
J. Küng
S. Chakravarthy
G. Anderst-Kotsis
A Min Tjoa
I. Khalil Ibrahim
Titel ML-PipeDebugger: A debugging tool for data processing pipelines?
Buchtitel Database and Expert Systems Applications - Proc. DEXA 2019, Part II
Typ in Konferenzband
Verlag Springer
Serie Lecture Notes of Computer Science
Band 11707
ISBN 978-3-030-27617-1
DOI 10.1007/978-3-030-27618-8_20
Monat August
Jahr 2019
Seiten 263-272
SCCH ID# 19010
Abstract

Data pre-processing for data analysis usually requires a considerable number of interdependent steps, many of which are liable to errors or to introduce unwanted biases. Such errors can lead to cases where predictions for similar data instances differ unexpectedly much. An important question is then to find out where in the data processing pipeline the deviation was caused. We present a tool that can help identify critical data processing steps in such cases, allowing to \debug" or improve data pre-processing and model generation. More generally, the tool gives a view of how different data instances behave in relation to each other throughout a data processing pipeline. The task to identify critical steps turns out to be surprisingly complex, mostly because features of different types and ranges have to be compared, because required statistical measures must be obtained from often small samples, and because time series can be involved.