Cluster analysis for multivariate application performance management issues

V. Precup. Cluster analysis for multivariate application performance management issues. 7, 2018.

  • Vlad-Ilarie Precup

With the growth in size and complexity of the software systems, their monitoring becomes more difficult to control. Generally, maintaining complex software systems imply complex reporting of the identified problems within them. However, this can also cause a more-than-necessary amount of detected problems. In this regard, a balance between frequent monitoring problem notification and coarse reporting is essential for running software systems effectively and efficiently. This thesis tackles the business problem of reducing the recurrent monitoring issue overload for users by systematically using unsupervised techniques to analyze and cluster previously-identified monitoring issues. We firstly propose a data selection and pre-processing method to prepare the monitoring data for clustering. Then we choose the most suitable clustering algorithm for the problem based on our comparison experiment and develop a lightly-customized version of the Wave-Hedges distance metric function to compute the distance matrix between the problem samples. We show the ground truth for a restricted data set and present the results of the parameter optimization technique based on external evaluation metrics. After evaluating the performance of the clustering model based on the obtained parameters, we show that our method can generalize for vast amounts of data and we conclude by comparing our method to the intuitive approach in the context of the business conditions imposed by our problem.