Noise in bug report data and the impact on defect prediction results

Authors Rudolf Ramler
Johannes Himmelbauer
Title Noise in bug report data and the impact on defect prediction results
Booktitle Proceedings of the Joint Conference of the 23rd International Workshop on Software Measurement (IWSM) and the Eighth International Conference on Software Process and Product Measurement (Mensura)
Type in proceedings
Mark Best Paper Award
ISBN 978-0-7695-5078-7
Month October
Year 2013
Pages 173-180
SCCH ID# 1341

The potential benefits of defect prediction have created widespread interest in research and generated a considerable number of empirical studies. Applications with realworld data revealed a central problem: Real-world data is "dirty" and often of poor quality. Noise in bug report data is a particular problem for defect prediction since it effects the correct classification of software modules. Is the module actually defective or not? In this paper we examine different causes of noise encountered when predicting defects in an industrial software system and we provide an overview of commonly reported causes in related work. Furthermore we conduct an experiment to explore the impact of class noise on the predictions performance. The experiment shows that the prediction results for the studied system remain reliable even at a noise level of 20% probability of incorrect links between bug reports and modules.