ML-PipeDebugger: A debugging tool for data processing pipelines?
A Min Tjoa
I. Khalil Ibrahim
|Title||ML-PipeDebugger: A debugging tool for data processing pipelines?|
|Booktitle||Database and Expert Systems Applications - Proc. DEXA 2019, Part II|
|Series||Lecture Notes of Computer Science|
Data pre-processing for data analysis usually requires a considerable number of interdependent steps, many of which are liable to errors or to introduce unwanted biases. Such errors can lead to cases where predictions for similar data instances differ unexpectedly much. An important question is then to find out where in the data processing pipeline the deviation was caused. We present a tool that can help identify critical data processing steps in such cases, allowing to \debug" or improve data pre-processing and model generation. More generally, the tool gives a view of how different data instances behave in relation to each other throughout a data processing pipeline. The task to identify critical steps turns out to be surprisingly complex, mostly because features of different types and ranges have to be compared, because required statistical measures must be obtained from often small samples, and because time series can be involved.