Improvement of existing offline models using (censored) online data

Autoren Stefanie Brayer
Editoren
Titel Improvement of existing offline models using (censored) online data
Typ Bachelor-Arbeit
Universität Fachhochschul-Bachelorstudiengang Medizin- und Bioinformatik
Monat September
Jahr 2018
SCCH ID# 18070
Abstract

In industrial manufacturing processes, optimization is important in order to improve the efficiency of the machine and the product at the end. Hence, a lot of data is required to build machine learning models that can predict and optimize the process and product parameters. By now, pre-collected data that requires an elaborate and costly sensor setup is used for training the models. Now, so-called online data should be gathered and used as well. This data can be available in larger quantities. The restriction is that this data may be censored and provides not exactly the same precise information as the offline data does. The online data that could arrive sequentially in a stream is now utilized to improve the existing machine learning models while the latter remaining untouched. Therefore, the author uses an additional online learning model that takes the online data and additionally the predictions from the existing offline models as inputs. The author has developed an online learning framework around that in order to evaluate the performance of the new online model. With this framework it is possible to define which data should be used for the online learning process and how the learning should be performed. When using this framework, the user gets different visualizations at the end of the online learning test run where the predictive performance of the model is plotted as a metric of the error rate to the actual value. Some of the test runs have shown that the prediction accuracy can be actually improved when using the prediction of the newly trained online model rather than the prediction from the already existing offline model. The framework supports both the online data belonging to known scenarios as well as the unknown ones. Furthermore, censored online data can be used as well for fitting a model. As for all machine learning problems, it is necessary to have enough data for training the model in order to get reliable predictions for both training and testing data sets. In general, positive conclusions can be drawn primarily from the online learning. Poor results were mostly achieved with the censored data set. Fitting a model for censored data has been exposed as a difficult task.