Clustering: Mvtrans

M. Bechný. Clustering: Mvtrans. number SCCH-TR-20029, 4, 2020.

  • Michal Bechný
TypTechnischer Bericht
OrganisationSoftware Competence Center Hagenberg GmbH

This technical report summarizes the results of several clustering methods used on the dataset features.csv, which provides information about 9823 images. Since the dataset is high-dimensional containing many related and dependent variables, several dimension reduction techniques were used before clustering of the images - Principal Component Analysis (= PCA), Independent Component Analysis (= ICA), t-distributed Stochastic Neighbor Embedding (= t-SNE), Uniform Manifold Approximation and Projection ( = UMAP), and Non-negative Matrix Factorization (= NMF). Subsequently, hierarchical clustering and/or density-based clustering using algorithm DBSCAN were applied to find groups of images with similar characteristics on the reduced data. Using the NMF, clustering is provided automatically and thus no of these algorithm has to be applied on the reduced dataset.

Results of all the methods used confirmed the idea, that clusters of images are also similar with respect to characteristics, which were mined from the image name - background, contour ID, preprocessing step and sheet ID. These characteristics of interest were not used to apply clustering or dimension reduction techniques. However, they were evaluated in detail based on the individual results.