Clustering: Mvtrans

Authors Michal Bechný
Title Clustering: Mvtrans
Type techreport
Number SCCH-TR-20029
Organization Software Competence Center Hagenberg GmbH
Month April
Year 2020
SCCH ID# 20029

This technical report summarizes the results of several clustering methods used on the dataset features.csv, which provides information about 9823 images. Since the dataset is high-dimensional containing many related and dependent variables, several dimension reduction techniques were used before clustering of the images - Principal Component Analysis (= PCA), Independent Component Analysis (= ICA), t-distributed Stochastic Neighbor Embedding (= t-SNE), Uniform Manifold Approximation and Projection ( = UMAP), and Non-negative Matrix Factorization (= NMF). Subsequently, hierarchical clustering and/or density-based clustering using algorithm DBSCAN were applied to find groups of images with similar characteristics on the reduced data. Using the NMF, clustering is provided automatically and thus no of these algorithm has to be applied on the reduced dataset.

Results of all the methods used confirmed the idea, that clusters of images are also similar with respect to characteristics, which were mined from the image name - background, contour ID, preprocessing step and sheet ID. These characteristics of interest were not used to apply clustering or dimension reduction techniques. However, they were evaluated in detail based on the individual results.