Noise in bug report data and the impact on defect prediction results
|Title||Noise in bug report data and the impact on defect prediction results|
|Booktitle||Proceedings of the Joint Conference of the 23rd International Workshop on Software Measurement (IWSM) and the Eighth International Conference on Software Process and Product Measurement (Mensura)|
|Mark||Best Paper Award|
The potential benefits of defect prediction have created widespread interest in research and generated a considerable number of empirical studies. Applications with realworld data revealed a central problem: Real-world data is "dirty" and often of poor quality. Noise in bug report data is a particular problem for defect prediction since it effects the correct classification of software modules. Is the module actually defective or not? In this paper we examine different causes of noise encountered when predicting defects in an industrial software system and we provide an overview of commonly reported causes in related work. Furthermore we conduct an experiment to explore the impact of class noise on the predictions performance. The experiment shows that the prediction results for the studied system remain reliable even at a noise level of 20% probability of incorrect links between bug reports and modules.