A framework for unsupervised anomaly detection in production quality data
|Title||A framework for unsupervised anomaly detection in production quality data|
|School||FH OÖ, Fachhochschul-Masterstudiengang Data Science and Engineering|
In industrial production, the use of sensors and thus, the amount of data generated is continuously increasing. The trend is part of the Industry 4.0 revolution and enables a massive source of complex sensor data. Anomalies in sensor data often contain useful information about abnormal properties of the system itself and the entities that influence the data generation process. In order to take advantage of the new data volume, automated anomaly detection is essential. This thesis aims to develop a software tool that checks historical and current production-quality data from the data warehouse for anomalies. The tool provides continuous insight into changes in production-quality. The basis is an automated detection and evaluation of statistical anomalies for a large number of time series. The investigated time series are the temporal course of sensor failure rates in relation to particular aspects (e.g. type of failure, information about products, the production process, or measuring sites). The investigated time series and the corresponding anomalies are explained by sensor failure rates and the absolute amount of observed sensor tests. The basis of this thesis is the assumption that productionquality data can be reasonably approximated by a binomial distribution with a specific failure probability. An outlier detection method based on the distribution assumption is proposed. In order to test the viability of the method and thus, the validity of the distribution assumption, standard unsupervised outlier detection techniques are implemented and evaluated. If the distribution assumption holds, the proposed method solves the challenges that anomaly detection in production-quality data implies and produces outlier scores that coincide with the actual anomaly severity. Also, a trend detection pipeline is presented in which various state-of-the-art algorithms are combined to detect statistically justified trends in production-quality data.