Biclustering for detction of industrial specific missing data patterns

Authors Michal Bechný
Editors
Title Biclustering for detction of industrial specific missing data patterns
Type master thesis
Institution Master's Program Statistics
School Johannes Kepler University Linz
Month March
Year 2020
SCCH ID# 20026
Abstract

The main aim of this thesis is to find a suitable approach to detect missing data patterns specific for data of the industrial company voestalpine Stahl GmbH. Since most of the existing research on missing data patterns comes from the field of social sciences, I discuss how to analyze such patterns effectively in the case of industrial data, which are typically of larger dimensionality. As a suitable way to detect any missing data pattern, I suggest to analyse associations and dependencies between binary-coded variables that indicate missing values in the dataset. Especially for the detection of missing data patterns specific for data of the company, I compare two existing biclustering methods - Ensemble Biclustering with iBBiG algorithm and the Latent Block Model. These methods are applied on two datasets provided by voestalpine Stahl GmbH for this work. Based on the evaluation of the results I found that Ensemble Biclustering with iBBiG algorithm is suitably designed for the needs of the company and performs better than the Latent Block Model in terms of industrial-specific missing data patterns' detection.